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Tag der miindlichen Priifung 



Abstract 



We investigate various variants of the Hermitcan version of the Local Bosonic 
Algorithm proposed by M. Liischer. The model used is two-dimensional Quantum 
Electrodynamics (QED) with two flavours of massive Wilson fermions. The sim- 
plicity of the model allows for high statistics simulations close to the chiral and 
continuum limit. 

To find optimal CPU cost behaviour, we carefully scan a 3-dimensional para- 
meter subspace varying the approximation polynomial parameters n and e as well 
as the number of over-relaxation steps within each update trajectory. We find 
flat behaviour around the optimum and a modest gain with respect to the Hybrid 
Monte Carlo algorithm for all variants. Generally, the gain is slightly smaller for 
the reweighting method than for the acceptance step variants and quite different 
for plaquette-hke and correlator-hke observables. 

On the technical side, we demonstrate that a noisy Metropohs acceptance step 
is possible also for the Hermitean variant. The numerical instabilities appearing in 
the evaluation of the factorized form for the approximating Chebyshev polynomial 
are investigated. We propose a quantitative criterion for these instabilities and a 
reordering scheme of the roots reducing the problem. 

A different formulation of the Hermitean Local Bosonic Algorithm using an 
acceptance step which generally avoids these instabilities was recently proposed. 
We compare the CPU cost to that of Hybrid Monte Carlo and to the more standard 
Local Bosonic Algorithm variants and find compatible results. 

The more physical problem of topological charge sectors and metastability is 
addressed. We find no plateau in the effective pion mass if metastabilities become 
too large. 



Zusammenfassung 



Gegenstand der Untersuchung sind Varianten der Hermiteschen Version des von 
M. Liischer vorgeschlagenen Local Bosonic Algorithm. Das dabei benutzte Modell 
ist die zweidimensionale Quantenelektrodynamik (QED) mit 2 massiven Flavours 
von Wilson-Fermionen. Die Einfachheit des Modells ermoglicht Simulationen mit 
hoher Statistik nahe am chiralen und Kontinuumslimes. 

Um Bereiche mit optimalem CPU-Kostenverhalten zu finden, untersuchen wir 
einen 3-dimensionalen Parameter-Unterraum, wobei wir die Parameter des Nahe- 
rungspolynoms n und e sowie die Anzahl der Uberrelaxationsschritte pro Update- 
Trajektorie variieren. Wir erhalten dabei bei alien Varianten ein flaches Verhal- 
ten um das Optimum und einen kleinen Kostengewinn im Vergleich zum Hybrid 
Monte Carlo Algorithmus. Der Gewinn ist generell fiir die Reweighting Methode 
etwas kleiner als fiir die Akzeptanzschritt- Varianten und deutlich unterschiedlich 
fiir Plaquette-ahnliche bzw. Korrelator-ahnliche Observablen. 

Auf der technischen Seite zeigen wir, daB ein stochastischer Metropolis-Ak- 
zeptanzschritt auch in der Hermiteschen Variante moglich ist. Die numerischen 
Instabilitaten, die bei der Berechnung der Chebyshev-Naherungspolynome in der 
faktorisierten Form auftreten, werden untersucht. Wir schlagen ein quantitatives 
Kriterium fiir die Instabilitaten und ein Umordnungsschema fiir die NuUstellen des 
Polynoms, das dieses Problem reduziert, vor. 

Eine alternative Formulierung des Hermiteschen Local Bosonic Algorithm mit 
Akzeptanzschritt, die diese Instabilitaten generell vermeidet, wurde vor kurzen 
vorgeschlagen. Wir verglcichen die CPU Kosten mit dcnen des Hybrid Monte 
Carlo Algorithmus und der iiblichen Local Bosonic Algorithm Varianten und finden 
kompatible Resultate. 

Im Weiteren wird das mehr physikalische Problem der topologischen Ladungssek- 
toren und der Metastabihtat behandelt. Wir finden kein Plateau der effektiven 
Pionenmasse, falls Metastabilitaten zu groB werden. 
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Chapter 1 
Introduction 



The task of physics is the determination of all relevant quantities of Nature from 
basic "principles" . In the case of elementary particle physics the relevant guidelines 
which have proven to be successful are the symmetry and unification principles. 



Their application leads to the so-called standard model [pla61|| , incorporating 3 



of the 4 known basic forces, namely the electro-magnetic, weak and strong in- 
teractions. Basic degrees of freedom of the standard model are the fundamental 
fermions (quarks and leptons) and their interaction with each other is described 
via the exchange of bosonic gauge quanta. An important part is formed by the 
Higgs mechanism, generating mass for fermions and massive gauge bosons and 
introducing the scalar Higgs particle into the model. The theory is formulated as 
a quantised relativistic local gauge theory coupling the symmetry principle with 
the dynamics of fields ||Zin94| , |Che84|| . Main building blocks of the standard model 
are Quantum Electrodynamics (QED) with symmetry group U{1) (as part of the 
electro-weak sector U{1)<^SU{2)) and Quantum Chromodynamics (QCD), relying 
on the symmetry group SU{3). 

Yet the standard model leaves a lot to be explained. It has as many as 19 free 
parameters ||Don92[ and no reason is given why a symmetry of SU (3) ® SU{2) ® 
U{1) is realized in nature. Thus we would like a more advanced theory with fewer 
parameters and an even more symmetric structure. Possible ways of extending the 
standard model include the search for more fundamental constituents of quarks and 
leptons, string theory including supersymmetric partners to the known elementary 
particles and grand unified theories. In these higher dimensional symmetry groups 
are applied at very large energy scales, which for lower energies are broken down 
to the familiar standard model structure ||Sal90|| . 



One main research focus is the relativistic SU{3) gauge theory including dy- 
namical fermions. Quantum Chromodynamics (QCD) ||Fri73|| , which describes 
hadrons and nuclei under strong interaction with the help of quarks and gluons 
as fundamental degrees of freedom. The beta-function is known at 3-loop level, 
leading in the high energy limit (or equivalently at small distances) to asymptotic 
freedom, i.e. the appearance of almost unbound quarks. In that case, the running 
coupling constant is small enough to allow perturbative methods. Applying these. 
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there exist in this regime convincing predictions of the theory, which agree well 
with experiment ||lVlut87 . 



On the energy scale of the hadrons themselves, i.e. for the long distance part of 
the strong interaction, quasi-free quarks were never observed. The quark coupling 
in this regime is too large for perturbation theory around the free solution to be 
sensible. We therefore need non-perturbative methods to shed light on e.g. the 
questions of quark confinement and hadron masses. 

Among the attempts to reach non-perturbative insight into the structure of 
QCD approaches using QCD sum rules, large expansions, the Bethe-Salpeter 
equation or Discretized Light-Cone Quantisation [Pro91|| can be mentioned besides 
effective purely phenomenological models. For an overview discussion we refer to 
the textbook of Narison |[Nar89| . 

Compared to these approaches, lattice gauge calculations offer a way to results 
stemming directly from the use of first principles. Lattice gauge theory uses a path 
integral representation with a space-time grid regularisation with lattice spacing a 
|Wil74|| . The theory is set up in such a way that the naive continuum limit a — 



yields the desired continuum Lagrangian to leading order in the lattice spacing. 
This of course offers a certain freedom in the choice of the lattice theory. In this 
approach all quantities are measured in units of the lattice spacing or inverses 
thereof. Results have to be extrapolated to the continuum limit, in which the 
lattice spacing a is set to zero and the dimensionless lattice size L to infinity. In 
order to still have meaningful results, a continuous phase transition point has to 
be chosen for this procedure. At this point, all correlation functions diverge, so 
that results in physical units can still be finite. In this limit, we will therefore 
retain ratios of (e.g. mass) observables corresponding to relations between renor- 
malised quantities. An important point is that certain continuum symmetries (like 
e.g. Lorentz symmetry) can be broken on a space time grid. The restoration of 
these symmetries has to be checked in order to justify extrapolation results. For 
completeness we mention that in order to exclude a contamination of the results 
by finite size effects, one would like to send also the physical lattice extent a ■ L to 
infinity, a limit called the thermodynamic limit. 

One non-perturbative approach to lattice gauge theory obtains results with 
the help of Monte Carlo methods. Introductions can be found in ||Cre83| , [Kal9CI| , 
Mon94|| . To illustrate the success of this approach, we mention that a recent 
work was able to determine ratios of hadron masses to an error of 2% ||Aok97| . 



Nevertheless, we are still far away from a complete understanding of the physics 
of hadronic systems using lattice gauge theory. 

A severe problem appearing in simulations is the phenomenon of critical slow- 
ing down, i.e. the fact that the CPU cost of a simulations grows more than 
proportional to the space-time volume of the lattice ||Sok89 . Especially fermion 



simulations are very time consuming [[For97a| . They face the problem that the 
fermion interaction is inherently non-local, being described by a determinant in 
the path integral. Thus, up to now most large-scale calculations apply the so-called 
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quenched approximation, replacing this determinant by a constant. Thereby the 
dynamics of fermionic vacuum fluctuations interacting with a bosonic gauge field 
are ignored. 

The standard algorithm used to include dynamical fermions is the so-called Hy- 
brid Monte Carlo algorithm |Pua87||. Here a trajectory of small size update steps 



is generated introducing a fictitious computer time coordinate, conjugate momenta 
and the corresponding classical equations of motion. At the end of the trajectory 
an acceptance step is used to make the algorithm exact. Other approaches, e.g. 
based on Kramers equation ||Hor91| , [J an95|| , have not shown significantly better 
behaviour than Hybrid Monte Carlo. 

Recently, an alternative approach was proposed by M. Liischer |[Lue94|| . This 
so-called Local Bosonic Algorithm (LBA) applies an n-th order polynomial approx- 
imation to the fermion determinant and thus makes it possible to rewrite the deter- 
minant as a set of n local and bosonic integrals plus a non-local correction term. As 
simulations of locally coupled degrees of freedom are much easier, this could poten- 
tially reduce the computational task. This algorithm has generated considerable 
interest ||Jeg954 |For95| , Por96| , |Jan95a| , |Jan964 [Wol971| . Examples for applica- 
tions of the local bosonic algorithm are Monte Carlo simulations of lattice QCD 
t'br95, Jan96a], Supersymmetry | |Mon97b |, the Hubbard model | |Saw97 | and the 
Schwinger model both with staggered ||Pea94|| and Wilson fermions | |Els96a] , [Els97| . 

Of course, this new algorithm has to be tested and its efficiency compared to 
the standard Hybrid Monte Carlo algorithm (HMC) ||Bor96a] , |Jeg954 |Jeg96|| . In 
this context it may be noted that it is well known how to tune the parameters of 
Hybrid Monte Carlo routines, whereas this is still under investigation for the local 
bosonic case. 

For the proposed studies of algorithms for dynamical fermions that are rather 
time-consuming, we want to advocate the massive 2-fiavour Schwinger model (2- 
dimensional QED) ||Sch62| , Pet95|| as a low-cost laboratory ||Bar97|| . With the 
appearance of an axial anomaly, confinement, light pseudo-scalar and heavy scalar 
mesons, it has enough rich physics to be similar to QCD. One of the aims is to set 
up a program package facilitating further studies. 

This work introduces the Schwinger model variants commonly used, and gives 
some details on the lattice 2-fiavour Schwinger model. We will briefiy describe the 
Hybrid Monte Carlo algorithm applied to set the scale for the CPU cost comparison 
and the four variants of the Hermitean version of the local bosonic algorithm we 
are testing. We will discuss some problems regarding numerical instabilities and 
topological metastabilities encountered in our simulations. Finally, we give results 
of our investigation of the CPU cost of the plaquette and the pion correlator. 

Throughout this work, figures and tables are defered to the end of each chapter. 
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Chapter 2 



The model: Schwinger model — 
2D QED 



2.1 Continuum 

Quantum Electrodynamics (QED) in one space and one time dimension {d = 2) 
with Nf flavours of fermions having mass rua, a = 1 . . . Nf, is generally defined by 
the continuum Lagrangian in Euclidean space 

1 

^ = jP' + J2 + + ^aW] , (2.1) 

a=l 

where we use the standard conventions h = c = 1. The fermion fields are 
Grassmann 2-spinors having dimension [m] 2 and the electromagnetic tensor 

f-=(° f) (2.2) 

in 1+1 dimensions only includes the electric field E, as because of the missing 
transverse directions no magnetic field is existing. We point out that the mass 
and the electric charge are of the same dimension. 



For an introduction to Euclidean space we defer the reader to [Mon94|. Con- 
ventions are generally collected in App. |^. 

2.1.1 Massless 1-flavour model 

2-dimensional 1-fiavour QED with massless fermions, i.e. Nf = l,mi = 
presented by Julian Schwinger as an analytically soluble model in 1962 
This version is generally called the Schwinger model. 

Schwinger was able to show that a theory of a massless vector gauge field 
does not prevent the existence of a massive particle. The analytic solution allows 
only uncharged states. The thus formulated theory was therefore regarded as a 
model for complete charge screening by vacuum polarisation | pi.ot79| . This so- 




called "quark" trapping became an interesting object of study in view of quark 
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confinement | [Rot79| |. The fact that no electron excitations exist was confirmed in 
numerous works |[Low71|, [Kog75 . 



An equivalent formulation of the Schwinger model in boson fields using the 
mass parameter 



TJlB 



is given by a Lagrange density of the form ||Col75a[ |Col75| , |Col76| , [Kog75 



(2.3) 



(2.4) 



thus enabling us to interpret the spectrum as that of free bosons, the one-particle 
state with mass rriB corresponding to a quark- ant iquark pair, i.e. a meson. Above 
that we find a continuum of two meson, three meson . . . states beginning at the 
minimum energies of 2771 b, 'itriB, ■ ■ ■, where the corresponding mesons are at rest 
relative to each other. A pedagogical introduction can be found in | Dit86 |. 

A further feature of the Schwinger model is the appearance of a non- vanishing 
vacuum condensate (ipip). Naively we would expect conservation of electric as 
well as axial charge. Yet the phenomenon of an axial anomaly is observed | JBro63| , 
Kog75| . 



2.1.2 Massive 1-flavour model 



Much interest was concentrated on the variant of the Schwinger model with mas- 
sive fermions, Nf = l,mi = m 7^ 0, often called the massive Schwinger model. 
Using quasi-classical approximations, Coleman was able to confirm that also in this 
model no charged particles are allowed as asymptotic states ||Col75|| . One finds in 
the massless case decoupled, degenerate vacua upon which for each value of the 
parameter 6, corresponding to a constant electric background field Ec = ^6, the 
same spectra are built up. In four dimensions this field would be eliminated due to 
pair creation. In the case of two dimensions this is not possible; electron-positron 
pairs are only created until the background field falls below a critical value of 
-^c"* = |, so that 6 can be chosen to lie in the range [— tt, tt]. Coleman showed in 
a semi-classical calculation that also in the massive case the background electric 
field is decisive for the structure of the spectrum. In the weak coupling limit we 
find for the number of stable states 



N 



4m^ 



1 - ^ 



2v^ - ln(2 + V3)) +0(1) 



(2.5) 



which at vanishing coupling diverges as expected. In the strong coupling limit the 
result is one, two and three stable particles for the ranges |6'|>|,0<|6'|<| and 
9 = respectively [|Col75 . 

Basic to the expansion around the massless case is the proof that this limit is 
allowed and leads to the soluble massless Schwinger model | |Kog75| . The connection 



of these phenomena and topology was investigated in [|Par93|, |Man85| . 
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The massive Schwinger model can equivalently be described by an interacting 
boson field with the Lagrange density ||(Jol75| 



q1 

^^(j) ~ ^^B^^ + 'fnmB — cos(2-\/7r0) 



(2.6) 



where 7 = 0.577 ... is the Euler constant and itlb the mass parameter Eq. p.3| . An 
expansion in the fermion mass yields through comparison of the coefficients of cfP' 
for the mass of the lowest state |[Kog75| , |Var94 



Ml 



m 



B 



(1 + 2— e^) + 0(m2). 



(2.7) 



In the framework of lattice gauge theory the massive Schwinger model was investi- 
gated numerically using Hamiltonian methods [Kog75, CreSUj. Here besides mass 



eigenvalues of the lowest-lying states also the coefficients of the linear potential 
in the limit of small mass were obtained. The results are in good agreement of 
some percent error with the continuum theory ||Cre8CI|| . Discretized Light-Cone 



Quantisation ||Bro91|| was able to calculate whole mass spectra in some Fock space 
approximation [[E1187| , [Els95|] . These data were used to study finite temperature 



quantities and critical exponents of the massive Schwinger model |Els96 . 



2.1.3 Massless A^- flavour model 

The iV-flavour model can be solved analytically in the limit = 0, Va = 1 . . . 

Main predictions are the existence of a massive pseudoscalar isosinglet 



Het88 



state with mass 



m 



(2. 



and A^^ — 1 massless (Goldstone-like) states. We would like to point out that earlier 
work identifies only A^ — 1 pion states |[Aff86|| . Both the fermion condensate ('?/''?/') 
and the pseudoscalar density {ip'-^^ip) are predicted to be zero. 



2.1.4 Massive 2-flavour model 

Most useful for lattice investigations with Wilson fermions is the massive degen- 
erate 2-flavour model Nf = 2, mi = 1712 = m. It is flrst of all convenient for 
technical reasons as the effective probability after integration of the fermionic de- 
grees of freedom is manifestly positive. On the other hand, the 2-flavour model 
has the nice feature of being rather QCD-like, as was observed in several classic 
papers jCHTTSj pit^Bj , [H5igB| . 

The model describes light pseudoscalar isotriplet states (vr, analogous to the 
Goldstone particles of QCD), a pseudoscalar isosinglet state {rj, much like the rj' of 
QCD) and scalar mesons (oq, fo). For most of these the mass perturbation theory 
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is known to first order [|Het95|, lHar94 



m. 



V 



2 1 

2.066m3e3 



'IT 



(2.9) 



The rich meson physics therefore allows to set the scale via the pion mass and 
measure physical mass ratios, e.g. Fermionic densities are generally zero with 
the exception of the fermion condensate, which is found to be real. 



2.2 Lattice formulation 

The lattice version of the Schwinger model with two flavours of fermions of identical 
mass is defined using the positive effective distribution for the compact U{1) link 
variables [4^^ 

Peff[f/] oc detM^e-^^l^l (2.10) 

one obtains after integration over the fermionic Grassmann variables in the path 
integral Eq. and discretization of the fields ||Wil74|] . We use the standard 
plaquette action 

Sg[U]=(3Re E(l-^p) (2-11) 
p 

with P = the dimensionless lattice gauge couphng and plaquette variables 
Up X defined starting at the lower left corner of each plaquette 

Up . = UU^^-^ f/.+^,^ Ux^-, , (2.12) 

so that the lowest order expansion in the lattice spacing matches to the naive 
continuum limit. Barred Greek symbols signify the orthogonal direction. The 
Wilson fermion matrix is given by 

Mx,y = 6x,y - 5:(5._^,,(1 + Y')Ux-f.,, + 5.+A,y(l - inulJ , (2.13) 



where k = 2{mli+d) dimensionless Wilson parameter. Gamma matrices and 



details of notation are given in App. [A.4| . We remark that the lattice spacing a is 



only explicitly shown in exceptional cases to avoid confusion. Generally it is set 
to 1 in all formulae. 

Chiral symmetry is explicitly broken by the Wilson term. We remark that, as 
— 1 is in the centre of the U{1) group, periodic or anti-periodic boundary conditions 
for the links should be irrelevant. 
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The 2-flavour case ensures that the effective distribution to be simulated is 
manifestly positive. We work with the Hermitean fermion matrix |Lue94| 



Q = Cot'M (2.14) 

scaled so that its eigenvalues are in [—1, 1] using the scaling factor 

m + d 1 11 , , 

m + zacM l + 2aKCM 



Explicit formulae are given in App. |A.4 



The critical Wilson parameter is known in the weak coupling limit 

KciP -^00) = ^ (2.16) 

and expected to be larger for lower (3. It is still an open question whether simu- 
lations above Kc are possible. As this was not the focus of this work, we did not 



investigate this point. Preliminary results (e.g. Sec. |6.1|) suggest that reasonable 
results are difficult but possible in this regime. The second order phase transition 
point (and thus the continuum limit) is reached for f3 ^ 00 with lines of constant 
physics fixed via the demand to keep certain (e.g. mass) ratios constant. 

Observables 

A measurement of the average plaquette J2x Up x and of the temporal and spa- 
tial Polyakov loops averaged over the orthogonal direction Pl = -f 11^2=1 11x1=1 Ux,i 
and Pt = J J2xi=i 1112=1 ^a;>2 is directly possible from the definitions. 

Averages of local fermion bilinears (ipTip) can be measured applying a noisy 
estimator scheme, using random spinors ri{x) as sources for the solver as shown in 
App. |A.5| . One expects the pseudoscalar density (also called the chiral condensate) 



(ip'y^ilj) and the further densities {ip'-f^ip) to vanish within errors. The fermion con- 
densate (ipip) remains non-zero even in the massless limit because Wilson fermions 
break chiral invariance explicitly ||Wil74| . 



For the determination of meson masses operators with various quantum num- 
bers are defined 

• flavour triplet - {ip'^^Tip) 'vr', {ipT%l)) 'oq', 

• flavour singlet - {ip'-^^ip) 'rj\ (ipip) '/o' . 

Moreover, insertion of a 7°, e.g. {ip'f^'f^Tip) for the vr, leads to alternative operators 
with the same quantum numbers in the rest frame. This exhausts the Dirac algebra 
in two dimensions. The result is summed up in Tab. [A.l . 



The calculation of temporal correlators C^^(At) involves point sources at ran- 
domly chosen positions (x, t) and summation over spatial displacements y in the 
other time slice {y, t + At) to project out the zero momentum states. As to the 
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flavour-singlet channels, the disconnected piece is subtracted with the aid of the 
noisy inversions performed earlier for the measurement of the condensate | Uka95 |. 
For more details consult App. |A.5| . 

All measurements are analysed taking into account covariances and auto-correlations 
of the correlators ||Bra92|| . A binning method is applied to reduce data size. 

The scaling idea we are using is based on the following strategy. Imagine that 
m,r and the ratio ^ are fixed, e.g. by experiment. Working at a certain value of 
P, we will then have to adjust k, such that the ratio is the physical value, while 
sets the scale, determining the lattice spacing in physical units. The size of the 
lattice has to be chosen so that finite size effects are small. 

Repeating this procedure at larger /3 results in a smaller pion mass in lattice 
units, which again sets the scale for the lattice spacing and thus determines the 
scaling factor achieved. 

We remark that motivated by the continuum results we expect the masses to 
scale like 1/ ^/J3, so that we have to increase the value of j3 enormously if 

we want to double the lattice size and reduce the lattice spacing by a factor of 
2, keeping physical ratios constant by adjusting the k value. This unfortunately 
limits the usefulness of the Schwinger model, as very high f3 simulations encounter 
topological metastabilities ||Joo9(J|, |Dil93|, [li)ls97a|| . 
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Chapter 3 



The standard way: Hybrid Monte 
Carlo 



We are ultimately interested in deciding whether the new local bosonic algorithm 
can be faster than the "workhorse" Hybrid Monte Carlo (HMC) | pua87| | used 
for more than 10 years. We thus implemented a HMC code working with the 
Hermitean matrix Q to set the scale. A brief description of the formulae is given 



in Sec. 3.1 



In order to make a fair comparison we ulilize the optimisation procedures now 
common in the application of the HMC. Namely, the implementation includes 
trajectory length set by n ■ Ar = 1 and acceptance ~ 70%. We further use 
the fact that extending the solver solution in the trajectory via the condition 
x= to generate an optimal new start vector for the inverter results in a gain of 
approximately 20%. 

The inverter algorithm applied throughout is a standard Conjugent Gradient 
algorithm (CG) ||Ral72| , |Sto90|| used with precision 10~^. The decision not to use 
advanced inverters like BiCGgammaS for the observables was taken in order to have 
a guaranteed convergence for all simulation parameters. In order to avoid trouble 
with bad pseudo-random numbers, we use Liischer's high-quality random number 
generator (RG) ||Lue93|| , which has been shown to have a very long recursion period 
and efficient decorrelation properties. It is used in a vectorized form applying 257 
generators in parallel. 

The standard way to get portable results for CPU cost analyses is to count in 
matrix multiplication operations. These are the most CPU-time intensive parts of 
any realistic QCD calculation. We describe our explicit counting in Sec. [3^ . 

Generally, one uses this information to define the cost of an algorithm by quan- 
tifying the cost needed to generate two independent (i.e. decorrelated) configura- 
tions on which measurements are executed. Assuming an exponential decorrelation 
of configurations [ [Bra92 | an estimator for the distance of independent configura- 
tions is given by twice the integrated auto-correlation time Tint > 1- Thus this 
quantity measured in units of molecular dynamics time or respectively matrix 
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multiplications, 



2Ti„t[Q ops] = 2Ti„t • Nq 

ops/update ) 

(3.1) 

can be regarded as a reasonable estimator for the CPU cost of an algorithm. We 
would like to point out that although the autocorrelation time is dependent on 
the observable, it is generally of the same order of magnitude characterizing the 
update algorithm. 



3.1 Implementation 

The idea of HMC is based upon rewriting the partition function given by 

J D[U] detg2 e-^G(^) (3.2) 
using complex pseudo-fermion fields (neglecting constant factors) 

j D[U]D[r]]D[r)^]e-^°^^^-'^''^~''^ (3.3) 

and introducing real valued momenta 7rx,f^ conjugate to the links [4,^ with a kinetic 
term which is not changing the path integral 

Z = J D[U]D[r]]D[r]^D[7r]e-^<'^^^-'''^'''^-'^^' . (3.4) 

The thus defined action is called in analogy to classical mechanics the Hamiltonian 
of the system 

H^SG{U) + r}^Q-^r)+^7r\ (3.5) 

Applying this distribution, the complete dynamical update of the link configuration 
is done in a three step procedure 

1. • a complex Gaussian update for the field % = Q~^r) 
• a real Gaussian update for the field tt 

2. micro-canonical steps via discretised but reversible leapfrog integration of the 
equations of motion in fictitious computer time r leaving the Hamiltonian 
approximately constant 

dH 

= — ^0 (3.6) 

OT 

3. a Metropohs acceptance step correcting for discretisation errors in the leapfrog 
integration with 

min{l,e-^^} (3.7) 
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3.1.1 Micro-canonical update 

We need the update equations, also called equations of motion (EOMs), for the 
fields Ux,fj, and 7Tx,f^. The Hamiltonian EOMs can be chosen to fulfil the two re- 
quirements 

• Ux,n should stay in C/(l), 

• the Hamiltonian is constant in computer time, 
yielding a simple update process for the links 

Ux,ij, = 'iT^x,tiUx,ii (3.8) 
while the equations for the momenta are fixed via 

if = . (3.9) 

Writing this out, we find 

H = Sg{U) + rl^^[Q-^]r^ + tttt . (3.10) 

Our aim is to express Sg{U) and ?7^^[Q~^]77 in terms of the derivatives Ux,iJ, and 
T^x,iJ. to allow numerical integration of the EOMs. We rewrite the second term 



Q-2j ^ = -?7tQ-2 [qq + qq] (3.11) 

and introduce abbreviations for parts independent of the derivatives 

uj = Q-^r] and ^^Quj (3.12) 

to rewrite 

H = Sg{U) - 2Re[^^Qa;] + tttt . (3.13) 
We now calculate the dependency of the first term Sq of the derivatives 

So = -/3ERe(?7.,iC/.+i,2C/^i^i2 + C/.,i^/.+i,2t^U 

+ Ux,iU^+i,2UU,A2 + Ux^U^+i,2UU,iUl2) . (3.14) 

Using the EOMs we can rewrite 

Sg = -(^T.Mi^x^Ux,,u,^i,,ul^^,ul2 + Ux^i^^^^^ 

X 

-Ux,iU,+i,2^^x+2,iUl^2,A, - Ux,iU,^i,,Ul+^^,t7rx,2Ul2) (3-15) 
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and reorder 



X 



(3.16) 



XyfJ, 



defining thus another abbreviation $ independent on the derivatives 

$,,i = -i/3(C/p,-C/p,_2) and $,,2 = - - C/p ■ (3.17) 
The dependence of Q on the derivatives is given by 

= -CoK^(5^_^,j,7^(l + l^)iT^x-„,,JJx-ix,i. - (^x+M,j/7^(l - l^)iT^x,nUl^) ■ 
In the Hamiltonian this is used in the combination 



X,IJ. 



(3.18) 



defining the abbreviation Q,. The final Hamiltonian formula is therefore 

H = Yt^xAR^^x,^^ + Re Q^^i, + ir^A (3.19) 



giving the required EOMs 

Re{^x,n + ^x,n} + T^x,n = . 

3.1.2 Leapfrog integration 

Heuristically one finds optimal behaviour of the algorithm if the EOMs 



(3.20) 



X,tJ, 



TTo. 



'X,H Re"[$3;^^ + ^X,fJ,} 

are integrated for N steps of length At under the constraint 

N -At^I . 
Integrating the link EOMs, we find 

UxA^o + St) = C/.,^(ro)e*"-''^^ 



(3.21) 



(3.22) 



(3.23) 
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to explicitly stay in the group manifold and 

T^xA'^Q + = T^xA'^o) - (M^x,M + ^x,n}) ■ Sr . (3.24) 

Expressing this in steps n = 1 . . . iV in the link case and n = 1 . . . iV + 1 in the 
momentum case, we can write the result in a computer-friendly way 

U.An) = C/.,^(n - l)e--''(")^- (3.25) 
T^xA^) = 7r^,;.(^-l)-(Re{$^,^(n-l) + Q^,^(n-l)})-AT(n-l) 

with 

Ar(0) = Ar(7V) = ^ 

Ar(n) = At Vn e {1 . . . - 1} . (3.26) 

3.2 Counting in Q operations 

We give the number of Q matrix multiplications necessary for one update step of 
our HMC implementation. In the formulae we denote with Ncg the number of 
matrix multiplications needed by the Conjugent Gradient routine applying Q^. 
The Hybrid Monte-Carlo algorithm applies 

• a number A^it of leapfrog iterations per trajectory 

• a CG routine to invert the matrix using Ncg iterations to invert 
and thus uses 

• 2NiiNcG Q operations for the main inverter, 

• 2NcG Q operations for the initial inverter. 

• The final inverter can be omitted as the data is known from the trajectory. 

• 2 Nit effective Q operations to update the links 

so that we finally obtain a total number of matrix multiplications of 

ops/update 2NitNcG + 2NcG + 2Nit . (3.27) 
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Chapter 4 



The new way: Local bosonic 
algorithm 



4.1 Basic idea 

As an alternative to the Hybrid Monte Carlo algorithm, M. Liischer proposed 
a local bosonic formulation ||Lue94|] . The one main variant of the local bosonic 



algorithm (LBA) we consider uses the fact that we are dealing with a rescaled 
squared Hermitean fermion matrix, so that the eigenvalues are real and between 
and 1. 

We can then identically rewrite the path integral 

Z = j D[U] det[g^]P„(det[Q2]) (P„,(det[Q^]))"' e-^^l^l (4.1) 

using for Pn{s) any real polynomial of degree n which approximates 1/s in (0, 1]. 
Then 

det[g2p„(g2)] = det[l - P] ^ 1 (4.2) 

is a small correction which can be treated via a correction scheme, provided we 
can sample the remaining distribution. 

We now assume that the approximation polynomial is of the form 

n 

Pn{Q'') = N^orraX{{Q^ - Zk) , (4.3) 

fc=l 

where A^norm is a normalisation constant and the are the roots of the polynomial 
coming in complex conjugate pairs Zk-, Zk with non-vanishing imaginary part. They 
define /ifc, Uk and /ifc, z^fc via 

z^ = Hk + ii^k P'k = -fJ'k I'k = ^k (4.4) 



if one defines the branch of the square root to be taken by > 0. Using this we 
can rewrite 

n n 

PiQ') = A^norm H (Q' " ^k) = A^norm Y[ [{Q ' l^kf + ^^fc] • (4.5) 
k=l k=l 
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As this factorises, sampling this distribution can be done using 

J D[(/.]D[(/.t]e-5<^'^^ oc (det[v4])-^ (4.6) 
to transform the distribution to a bosonic integral 

Pes{U) (X det[l- R]e-^^^^^JV(|)e-^^M(■Q-^^^)'^^+-l'^'U^] (4.7) 

with n complex bosonic Dirac fields (pk- 

The main criteria for the polynomials Pn{s) is that they should be fast conver- 
gent to ^ in (0, 1] and have roots coming in complex conjugate pairs. We chose as 
approximation polynomials the Chebyshev-derived polynomials proposed by Bunk 
et al.[lBun95a 



with 



X = 2 ^ - 1 , (4.9) 

such that s G [e, 1] is mapped to a; G [—1,1]. The polynomials T„(z) of degree 
n > are the standard Chebyshev polynomials given by 

Tn{z) = coshn0 with 2; = cosh0. (4-10) 

For completeness, we give some of the first polynomials 

Toiz) = 1 , T,{z) = z , niz) = - 1 , 

Ts^z) = Az^ - 3z , T^iz) = 8z'^ - 8z'^ + 1 . (4.11) 

An important feature is the Clenshaw recursion relation 

Tn+i{z) + Tn-i{z) = 2zTn{z) foT n > 1 , (4.12) 

known to be numerically stable. For more information we defer the reader to 



The polynomials P„(s) approximate 1/s for real s G [e, 1]. The rest term 
Rn{s) quantifies the quality of the approximation and the normalisation factor p„ 
is defined through the condition that Pn{s) exists even at s = via 

= sPn{s)\s=0 = 1 + PnTn+l ("Y^^) ^ ^" " T (-1+^) ' '^^'^^^ 



The explicit expression for the error term 



Ris) = '-t'y (4.14) 



can be used to deduce limits for the approximation quality 



\Rn{s)\ < \pn\ = I — , \ ,^ 1 1 with coshx = (4.15) 
cosh[(n + l)x\ 1 — e 



yielding 



-^osh[(n + l)x]+sinh[(n + l)x] " ^1 + V^) ^^"'"^^ ^^'^^^ 

proving an exponential convergence to 1/s for s G [e, 1]. Explicitly, we use as the 
approximation quality factor [ Lue94 , Bun95a ] 

Not necessary for the application in the local bosonic algorithm, but convenient 
is the fact that the roots of this approximation polynomial can be given explicitly 
observing that 

Pn{s) = ^ T„+i(a:) =T„+i(-i^) (4.18) 

for complex s,x. This yields 

= -(1 + e) - -(1 + e) cos(^) - sin(-^) , (4.19) 

i.e. n roots on an ellipse in the complex plane around the foci e and 1 on the real 
axis. 

We further can deduce the global normalisation constant A^norm from the sym- 
metry of the Chebyshev polynomial in [e, 1] 

r„,,(i±i).o ^ i±ip„(i±i)-,.o («o, 

to 



N-L. = ^ ft - ^0 • (4-21) 



2 ,=1 



The updating process consists of exact heatbath sweeps for the 0's and U's 
|Bes79|| , followed by a number of over-relaxation iterations. The formulae for the 
updates are given in App. 0. We confirmed in preliminary runs that the reflection 
sweeps for the 0's and ?7's have to be combined in pairs, as was observed before 



Jeg95| , |Jan96|| . In total, this introduces the parameters ra, e and the number of 



reflections per heatbath into the algorithm. 
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4.2 Correction step 



Finally, the approximation det[l — i?] ~ 1 has to be controlled. This in principle 
can either be done via a Metropolis accept/reject step with correction probability 

_^;.ri det[l-P^ 
det[l -P] 

or by reweighting 



P^^^^,^, = min[l, -j-^L _1] (4.22) 



where denotes the expectation values in the local bosonic ensemble and 

primed quantities are of the new configuration. 

We would like to point out that algorithms including an acceptance step need 
the reversibility of the update trajectory to guarantee detailed balance. This leads 
to a symmetrization of the update trajectories in these cases, which has to be 
included in the CPU cost derivation. 

On the other hand, the ensembles generated by the local bosonic distribution 
with and without correction are distinctly different. In simulations (det[l — R])l 



can in fact be far from 1 as will be shown in Tab. |6.5| . The integrated autocor- 
relation time can in these cases be small, while the error from each measurement 
grows to very large values. We thus can no longer use the autocorrelation time 
measured in matrix multiplications 

2rint[Q ops] (4.24) 

described in Ch. ^ as a good transportable indicator for the CPU cost of the 
algorithms. We have to regress to the basic effective cost defined by 

Ceff = -^total Q ops ' ~~r — \ (4.25) 

applicable to all measurements as only the total work and the real result (i.e. the 
error obtained for a certain observable) are used. The cost factor is normalised 
by the mean value of the observable to cancel the trivial dependency. We remark 
that this cost factor is even more observable-dependent than the standard choice 
Tint, prohibiting even a rough comparison of different observables. 

In a first step we approximatively included the correction via the lowest 8 
eigenvalues of Q^, using them to apply a global Metropolis correction step |[Ale95 



Recently, stochastic methods which are exact were suggested | Bor96 |. Some vari- 
ants of these will be discussed in the next sections. 



4.2.1 Reweighting 



The correction factor can be treated exactly using a stochastic method. Following 
a suggestion by Liischer ||Lue96| | , we rewrite 

(Odet[l-i?])i 



{det[l -R])l 



(4.26) 
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into 

We then interpret the factor e"''^'' as a Gaussian weight, resulting in 

(O) = = 7^ , Z^''' , (4.28) 

where (. . denotes the ensemble generated by the LBA distribution and the 
Gaussian noise vectors. Note that rewriting 

l-{l-R)-^ ^ -R{l~R)-^ (4.29) 

is numerically more stable as it avoids subtracting two almost identical objects. 

4.2.2 Noisy acceptance step method I 



The idea behind this noisy method ||Bor96|| is to generate a new configuration 



{U\ (j)') with the standard Liischer action and then to accept it according to a 
correction probability that is a function of a noise vector rj 

P^JS^^U'^^V) (4.30) 
satisfying detailed balance on average over the rj distribution 

m]P^u',^')^iuA^) det[l-i?] • ^ • ^ 

In order to do this one generates random numbers according to the distribution 
pHB ^^]^[Q^ ig constructed to give the required determinant factor. The acceptance 
of the Metropolis is fixed so that the total acceptance 

P(u,<t>)^{U',(t)')iv) = P^^P(u,(t>)^{U',(P')iv) (4.32) 
satisfies detailed balance. The almost obvious choice for the heatbath distribution 

"det[l-i?]" ^^-^^^ 
with A^n a normalisation constant requires an acceptance 

^{l<A)^(f/',<^')(^) = e-^^[^-«']-^''+''^[^-«]-^'') . (4.34) 

The proof of detailed balance can be seen from the symmetrical behaviour of the 
involved integrals 

I[dv] Ptu',<p')-.(u,d^) I[dfj] ^j^^[^e-''t[i-«']-^'7min(l, e-^ni~R]-'n+vni-R']-^v^ 
det[l - R'] lidv] mm{e-^'i'-^r'v^ ^-vHi-R'r'v^ ^ ^^^^^ _ 



det[l - R] J[dfj] min(e-^^Ii-^'l-'^ e-f>n^-R]-'v^ det[l - R] ' 
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(4.35) 



In order to make the algorithm practically working, we transform the heatbath 
distribution P^^ into a simple Gaussian using the substitution 

V = Bx (4.36) 

with B and i?^ given by the constraint 

B^B = 1-R. (4.37) 

We remark that while this transformation can be done trivially in the non-Hermitean 



case 



Bor96|| , we here need a little trick. The decomposition can be done easily 



using the product formula 



n/2 

1-R = N^ormQ' UiQ' - ^k){Q' - zu) (4.38) 

k 



taking one root of each complex conjugate pair to yield 

n/2 



B = N^o,r.Q\{{Q^ - zu) . (4.39) 

k 

Finally we obtain 

P^ = iv^e"^'^'[^^']"^^ = N^e-^'^ (4.40) 

and 

Ptu,,)-^iU',,') = min(l,e--^^ni-^'r^i^x+x^x). (4.41) 



4.2.3 Noisy acceptance step method I with adapted preci- 
sion 

This method makes use of a possible optimisation for Metropolis-type acceptance 
step schemes using a restartable solver like the Conjugent Gradient chosen for this 
study. 

As the random number governing the Metropolis decision is known before the 
solver is applied, it is in principle possible to interrupt the solver iterations and 
check whether the quality of the solver solution is already good enough to fulfil the 
requirements of a clear-cut decision, i.e. to distinguish the result from the chosen 
random number. If not, the solver is restarted and this procedure iterated. 

We implemented this idea in a simplified way. We demand a very limited solver 
precision of 10~^ in the first step and check if this quality is good enough. If not, 
we in a second step demand full solver precision of 10~^. 
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4.2.4 Noisy acceptance step method II 

Knowing of the idea of using Gegenbauer polynomials [Pun97|| to solve equation 
of the type 

A^x = y (4.42) 

as explained in App. we want to rewrite the formulae of the noisy Metropolis 
update section, resulting not in the acceptance Eq. |4.41| but in the corresponding 

PiU^iU',,, = min(l,e->^^[^-^'l-(i-i?.)[i-«rix+xtx), (4.43) 

where now we effectively do not need the square root of Q^P{Q^) but the inverse 
square root. 

It is again easy to see the idea. One starts with a heatbath distribution includ- 
ing a determinant factor (but now in the numerator) 

P^"" = det[l - i?^-'''^^-^')'' . (4.44) 

The acceptance needed in this case is 

PiU-iU',,')iv) = min(l, e-''^(i- Wd-i?').) . (4.45) 

We skip the proof of detailed balance as it is identical to the method I case. 

We apply the same transformation trick eliminating the Gaussian vector 77 via 
the substitution 

V=[l-RTh (4.46) 

given by the application of the Gegenbauer solver mentioned above. 

Thus we again constructed a purely Gaussian distribution for the x variables 

yet the acceptance is now 

Pm)^iU',4>') = min(l,e''^ («-«')"). (4.48) 

This formula makes it evident that in this scheme the main work is done by 
the Gegenbauer solver constructing the rj vectors. Thus its usefulness relies on the 
fact that the convergence rate of the Gegenbauer solver is equivalent to that of the 
Conjugent Gradient [Pun97|| . In Ch. |^ we will give results of the first simulations 
with the Gegenbauer solver. Details of polynomials and solver scheme are given 
in App. 1^. 

We would like to remark that an optimisation of this method analogous to 
that of the acceptance step method I with adapted precision is possible. The 
Gegenbauer inverter is not restartable, but a slight alteration storing some shift 
vectors in the iteration could be implemented without creating too much overhead. 
We thus could interrupt the solver iteration, check whether the solver quality is 
already good enough for the requested Metropolis decision, and continue if this is 
not yet the case. For reasons of limited computer resources this was not included 
in this study. 
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4.3 Counting in Q operations 

Our implementation of the Liischer local bosonic algorithm applies in the corrected 
measurement case 

• A^refl reflections 

• 1 heatbath step 

for the n bosonic flelds and the link field, while in the acceptance step cases twice 
that amount is necessary to symmetrize the trajectory. We use an abbreviation 

• -^sym — 1 for corrected measurement and 

• -^sym — 2 for acceptance step schemes. 

For the boson field correction part, both ways have to invert the approximation 
polynomial Q'^P{Q'^) which takes 

2iVcG(n+l) (4.49) 

Q operations. Heatbath and refiections for the boson fields both require 

• 1 Q operation to initialise the stored auxiliary fields 

• 1 Q operation in the force calculation routine 

• 1.5 Q operations in the update routine 
so that we obtain 

3.5 • nA^sym(A^refl + 1) + 2NcG{n + 1) . (4.50) 
The link updates take in 2 dimensions for heatbath and reflection each 

• 2 Q operations for the staples 

• 2 Q operations for the bosonic force parts 

so that wc end up with the total number of matrix multiplications necessary for 
one update step of 

ops/update — 

[3.5n + 4] • A^sym(A^refl + 1) + 2{n + l)NcG . (4.51) 
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Chapter 5 

The problems I: Instabilities 



In the update algorithms of Ch. ^ we encounter the basic numerical problem of 
evaluating a matrix-valued polynomial of high order. To be specific, we will in 
the following consider the problem evaluating a polynomial approximation of the 
inverse determinant 

detA ^ [detP„(A)]~^ , (5.1) 

using the Chebyshev polynomials as defined in Ch. [4.1| . 

The naive idea would be to use the factorized form Eq. O. However, the nu- 
merical construction of a polynomial using the product representation, can - due 
to rounding errors - easily lead to a loss of precision or even to numerical insta- 
bilities. This holds in particular if computers with 32-bit floating point precision 
are used. 

Often, as in the case of Chebyshev polynomials, numerically stable recursion 
relations are available (viz. Ch. 4J.) | Fox68|] . However, exact versions of the LB A 
|Bor96| ] or related approaches like the Polynomial Hybrid Monte Carlo (PHMC) 
algorithm |[t'br97b| , [t're97a|| often need the factorized form of the polynomial P„. 
Especially a decomposition of the polynomial into two or more (e.g. complex 
conjugate) parts can in general not be done without recursion to the factorized 
form. This is the numerical problem we are faced with in the Hermitean LBA 
variant using a Metropolis acceptance step. It is our intention to investigate in 
these cases several ordering schemes for the complex roots. 



5.1 Factorized Chebyshev polynomial 

Let us consider the Chebyshev approximation of a function f{s) depending on 
a real variable s by a polynomial Pn{s) of degree n. The motivation to initially 
study a single degree of freedom is that we might think of the matrix A as being 
diagonalized. Then the problem, Eq. reduces to finding a polynomial that 
approximates each X~^{A) separately, where X{A) is a real eigenvalue of A. We 
therefore expect that studying a single degree of freedom can provide information 
also about the qualitative behaviour of rounding error effects when the matrix 
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valued polynomial Pn{A) is numerically computed. 
In principle evaluating the partial product Pq{s) 



(5.2) 



fe=l k=l 



one has to define a normalisation A^norm the partial products, effectively dis- 
tributing the global normalisation constant A^norm to n fc-dependent normalisation 
constants n^. 

Considering precision losses or numerical instabilities can be understood by the 
following argument: The absolute values of the, in general complex, subsequent 
partial products |-P5(s)| and |Pg+i(s)| can be different by orders of magnitude 
if s ~ Zq. If now |P^+i(s)| ^ |-Pg(s)| then this must have been achieved by 
subtracting two large numbers, which bears the danger of a significant loss of 
precision. The problem itself suggests, however, its solution: The monomial factors 
in Eq. |]3|or equivalently the roots Zk should be ordered, if possible, in such a way 
that the absolute values of all partial products |P^(s)| have the same order of 
magnitude. Regarding an application to vectors (where s is a priori unknown), 
this has to be achieved in an s-independent way. 

To investigate the effect of reordering, we propose to use a simple criterion 
to determine the effects of these rounding errors. The idea is to evaluate in a 
first step for given q the maximal and the minimal value of |Pq^e('5)| over the 
spectral interval < s < 1. The ratio of the maximum to the minimum value, 
i.e. Rq = maXsg[o,i] |Pg_e(s)|/ minse[o,i] |Pq,e(s)|, is then a measure of how large the 
fluctuations of the absolute values of the partial products can become. If these 
fluctuations are very large, the polynomial can not be constructed in a safe way. 
Building the maximum of Rq with respect to q, we arrive at the final quantitative 
measure 



It is clear that Pmax has to be smaller than the inverse relative accuracy on a 
given computer as a necessary condition for the stability of the evaluation of the 
full polynomial. 

Another quantity of interest is the maximum value of the partial products itself 



This has to be be smaller than the largest representable number in order not to run 
into overfiow. To avoid underfiow one should thus also study Mmin by replacing 
the maximum in Eq. ^.4| by the corresponding minimum. We, however, restrict 
ourselves to the maximum quantity. 

Note that Pmax and M^ax are computed for s G [0, 1], whereas the Chebyshev 
polynomial has an exponential convergence only in the interval s G [e, 1]. However, 
as will be explicitly demonstrated below, our results for Pmax and M^ax do not 
depend very much on the choice of the lower end of the interval. 




(5.3) 




(5.4) 
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5.2 Ordering schemes 



In this section we introduce the different ordering schemes used for the roots 
Eq. |4.19| . In principle, one could try to also distribute the normalisation constants 



Uk introduced above in a /c-dependent way to reduce rounding errors. However, 
we found no improvement over the naive homogeneous distribution 

Uk = (A^norm)" (5.5) 

which we therefore adopt throughout the rest of this work. 
Naive ordering 

As naive we regard the ordering given by 

zr'" = Zk,k = l,---,n (5.6) 



with the Zk given in Eq. |4.19| . The roots z^ form an ellipse in the complex plane 



and in the naive ordering the roots are selected from this ellipse by starting at 
the origin and moving around anti-clockwise. This is indicated in Fig. p.l| a, where 
the roots are shown labelled according to the order in which they are used in the 
evaluation of the Chebyshev polynomial Eq. [4.3| . As we will see later, ordering the 
roots in this naive way gives rise to substantial rounding error effects, even leading 
to numerical overflow. 

Zurich group scheme 

Recently, De Forcrand brought to our attention the scheme used for the simulations 
of the Ziirich group | [I''or97| | , which consists of using the complex pairs Sk, S(^n+i)-k 



as the 2k — 1 and 2fc-th roots. Because of results generally worse than the naive 
scheme, we decided to simply state that this scheme is not to be recommended, 
and not to include it in the plots. We would further like to stress that the studies 
undertaken by the Ziirich group were of a kind not influenced by the quality of 
the ordering scheme. 

Pairing scheme 

A first improvement over the naive ordering is to use a simple pairing scheme, 
reordering the roots 

4''" = ^j{k) , = 1, . . . ,n . (5.7) 

We give the reorder index j{k) for the example of n/2 being a multiple of 4 and 
n' = n/8. In the lower half plane, Im Zk < 0, the pairing scheme is achieved by 

( n n n 

' = U'2'4+^'4' 

n n n 



n',--n' + l,--n',- + n' -1\ (5. 
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and for Im Zk > correspondingly. An illustration of the ordering in the pairing 
scheme is shown in Fig. f).lp . The label of the roots indicates in which order they 
are used in the numerical construction of the polynomial. 

In case oin/2 not divisible by 4, we search for the next integer m smaller than 
n/2 and divisible by 4. We then repeat the above described procedure on these m 
roots and simply multiply the remaining roots Zm+i ■ ■ ■ Zn/2 at the end. To make 
our procedure explicit, we include the Fortran code to reorder the first half of 
the roots in App. p.l| . The code for the second half of the roots is constructed 
analogously. The Fortran code also contains the case of n/8 being odd, where the 
above described procedure has to be slightly modified. 

Subpolynomial scheme 

The problem with rounding errors in computing the polynomial in the product 

representation arises most severely for a high degree n of the polynomial. In 

order to decrease the effects of rounding errors one may therefore be guided by the 

following intuitive observation. Let us consider the polynomial Pn,e{,s) with roots Zk 

and 1. If m is an integer divisor of n, the roots zi, zi+m, zi+2m, ■ ■ ■ zi+(^n/m-i)m 

turn out to be close to the roots characterising the polynomial P„/^e(s) of degree 

n' = n/m (note that we keep the same e). Moreover, the normalisation constants 

i_ 1 
Ck = (A^norm('^)) " and n'l^ = (A^norm(^')) ^re of the same order (the dependence 

on n of ?7.fc turns out to be negligible for large n). Then the product 

n/m—l 

" = n [Cj+ll-S - Zi+jm)] (5.9) 
j=0 

is a good approximation of Pn',e{s), |u — P„/,e(s)| <^ 1 for all e < s < 1, and |m| < e. 
The same argument may be repeated for the other similar sequences of roots, like 

This means that the product Eq. may be split in a product of m sub- 



products, in such a way that each of them approximates the factorized form of 
a polynomial P„'_e(s) of lower degree n' = n/m. Because of the lower degree of 
the polynomial given by the products such as Eq. pl9| , one may expect that much 
smaller fluctuations occur in the intermediate steps of the evaluation of each of 
these subpro ducts. 

The reordering of the subpolynomial scheme 

zf = Zj(k)]k = 1,. . . ,n (5.10) 

can be represented by 

j = l,l + m,2 + 2m, ...,1 + ( l)m. 



m 

Ti 

2, 2 + m, 2 + 2m, . . . , 2 + ( l)m, 

m 



m,m + m,m + 2m, . . . ,m + { l)m> , (5.11 

m J 
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where m is an integer divisor of n. 

We found that m has to be chosen m ^ ^/n to avoid severe loss of precision in 
the construction of the polynomial and to reduce rounding error effects to a toler- 
able level. We remark that the naive ordering is reproduced by the two extreme 
choices m = 1 and m = n. The Fortran code which generates the subpolynomial 
ordering of the roots may be found in App. p.2 . 



Bitreversal scheme 

The subpolynomial scheme can be generalised, leading to what we will call the 
bitreversal scheme. To illustrate how this scheme works, let us assume that the 
degree n of the polynomial is a power of two. One now starts with the n monomial 
factors in Eq. [4.3| , chooses m = n/2 and applies the subpolynomial scheme resulting 
in m binomial factors. We then proceed in choosing am' = m/2 and again 
applying the subpolynomial scheme to these m binomial factors which leaves us 
with m'/2 subpolynomials each of degree four. The procedure can be iterated until 
we are left with only one subpolynomial having the degree of the polynomial itself. 
The above sketched procedure can be realized in practise by first representing the 
integer label (counting from to n — 1) of the roots in the naive order by its bit 
representation. The desired order is then obtained by simply reversing the bits 



in this representation. The resulting reordering of the roots is shown in Fig. |5.1| c 
with n = 16 as an example. 

For n not a power of 2, we pad with dummy roots, chosen to be zero for instance, 
until the artificial number of roots is a power of 2. The bitreversal procedure can 
then be applied as described above. Afterwards, the dummy roots have to be 
eliminated from the sequence. 

To make the procedure of reordering the roots explicit, we again give the For- 
tran code used to generate the bitreversal ordering in App. p. 3 . 



Montvay scheme 

Recently, Montvay ||Mon97|| suggested to order the roots according to an opti- 



misation procedure which can be implemented numerically, e.g. using algebraic 
manipulation programs such as Maple. Let us shortly sketch how Montvay's order- 
ing scheme works and refer to ||Mon97 | for details. We assume that we have already 



the optimised order of the roots for the partial product Pg<n(s), Eq. pl2| . Then 
the values of \sPq{s){s — z)\ are computed for all z taken from the set of roots not 
already used. The values of s are taken from a discrete set of points, {si, . . . , stv} 
which are all in the interval [e, 1]. Now, the maximal ratio over s G {si, . . . , Sat} of 
all values \sPq{s){s — z)\ is computed for each root z separately. Finally, that root 
is taken which gives the lowest of these maximal values. Starting with the trivial 
polynomial Po{s) = 1, this procedure obviously defines a scheme with which the 



roots can be ordered iteratively. We show in Fig. p.l| d the resulting order of the 
roots using Montvay's scheme by again labelling the roots in the order as they are 
used to compute the Chebyshev polynomial Eq. 0|. 
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Clenshaw recursion 

We repeat the statement that the evaluation of a Chebyshev polynomial normally 
does, of course, not rely on the product representation, Eq. ^^31 - A numerically 
safe way to construct a Chebyshev polynomial is to use a recursion relation, such 
as the Clenshaw recursion Eq. |4.12| described in Sec. |[Fox68|| . The recursion 



is known to be numerically very stable and will serve us in the following as a 
reference procedure for the numerical evaluation of a Chebyshev polynomial. 



5.3 Scalar numerical tests 

In this section we will give our numerical results for the quantities -Rmax, Eq- 
and Minax, Eq. |^ . When these quantities assume large values, the numerical 
evaluation of the Chebyshev polynomial Eq. |4.3| is affected by rounding errors 
and precision losses or numerical instabilities can easily be encountered. All re- 
sults presented in this section are obtained on computers using 64-bit arithmetic. 
Montvay's ordering scheme is constructed applying a Maple program ||Mon97a[| re- 
quiring 40 digit precision. We want to emphasise at this point that the quantities 
-Rmax and Mmax, which we are using to test the different ordering schemes, do not 
directly correspond to the values |sPg(s)(s — z)\, which serve to optimise the root 
ordering in the Montvay scheme. 

In order to compute the values for -Rmax, Eq- aiid M^ax Eq. |5]^ we take 
5000 values of s, equally spaced in the interval [0, 1]. We check explicitly that the 
values of -Rmax do not depend very much on the lower end of the interval [smin, 1] 
from which s is taken. In Fig. ^.2| we show -Rmax as a function of the lower end 
of the interval Smin measured in units of the parameter e. We present data for 
three polynomial degrees n = 30, n = 86 and n = 146 at fixed approximation 
quality 6 = 0.001 using the bitreversal ordering scheme. As Fig. |5.2| shows, the 
dependence of -Rmax on Smin is very weak. For the other root ordering schemes we 
find a very similar behaviour of -Rmax as a function of Smin- This justifies the use 
of Smin = which we have used for the numerical tests described in this section. 

We start by comparing the subpolynomial and the bitreversal schemes, as they 
are closely related to each other. In Fig. ^]3| we show -Rmax and Mmax as a function 
of the degree n of the Chebyshev polynomial, keeping the maximal fit accuracy 
S = 0.1 constant by adjusting the parameter e accordingly. For the subpolyno- 
mial scheme, the divisor m is chosen to be m ~ -y/n. Fig. ^]3| clearly confirms 
the expectation that the bitreversal scheme, considered as a generalisation of the 
subpolynomial ordering scheme, gives smaller values of -Rmax and Mmax- For de- 
grees of the Chebyshev polynomial n > 40 rounding error effects are substantially 
suppressed in the bitreversal scheme compared to the subpolynomial scheme. 

In Fig. we show the values of -Rmax and Mmax for the bitreversal, the naive, 
the pairing and the Montvay schemes for the case of 5 = 0.001 as a function of 
n. We have also performed numerical tests at (5 = 0.1 and 6 = 0.01 where we 
found the same qualitative behaviour of -Rmax and Mmax as a function of n for 
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the different ordering schemes. The first striking observation in Fig. p]l is that 
with the naive ordering one obtains aheady for moderate degrees n ~ 30 of the 
polynomial large values of -Rmax and Mmax, indicating that rounding error effects 
are becoming severely problematic. Clearly, in the naive ordering scheme round- 
ing errors can lead to very large ratios of particular partial products. Using the 
naive scheme, especially on machines with only 32-bit precision, a safe evaluation 
of the Chebyshev polynomial in the product representation can certainly not be 
guaranteed. 

The behaviour of the values of -Rmax and Mmax obtained by using the naive 
scheme demonstrates the necessity of finding better root ordering schemes which 
are able to reduce rounding error effects in evaluating the factorized Chebyshev 
polynomial. That such ordering schemes do exist is also demonstrated in Fig. |5.4| . 
For n < 100, the values for -Rmax and Mmax obtained from the pairing, bitreversal 
and Montvay schemes are close to each other and many orders of magnitude below 
the ones of the naive scheme. However, for n > 120, the values of -Rmax from these 
ordering schemes also start to deviate from each other. Taking -Rmax as a measure 
of the effects of rounding errors, it seems that the bitreversal scheme can reduce 
the rounding error effects most efficiently among the ordering schemes investigated 
here. 



5.4 Matrix valued numerical tests 

We report on a direct test of the ordering schemes described above for matrix- 
valued polynomials. As these tests were done using an existing program running 
on the powerful massively parallel Alenia Quadrics (APE) machines, we briefly 
detour to 4-dimensional lattice QCD [[Mut87|| , with details given in App. |^. 



The task is to construct the matrix valued Chebyshev polynomial of the oper- 
ator Q'^ 

Pn{Q^) = f{nk{Q^-zu) , (5.12) 

fc=i 

where the roots Zk and the normalisation constant Uk are given by Eq. [4.19| and 



Eq. respectively. We evaluated the polynomial, Eq. ^.12| , using the Clenshaw 



recurrence as well as using the different ordering of roots described above. 

The numerical tests are performed on thermalized configurations on 8^ x 16 
lattices using the APE computers, which have only 32-bit precision. Simulation 
parameters were chosen to be /3 = 6.8, k = 0.1343 and Csw = 1.42511. They 
correspond to realistic parameter values as actually used in simulations to deter- 
mine values of Csw non-perturbatively ||Jan97|| . We adopt the same (Schodinger 



functional) boundary conditions as described in [[Lue971| for the evaluation of Cgw 
For the above choice of parameters and setting cm = 0.735, the lowest eigenvalue 
of is Amin = 0.00114(4) and the largest is Amax = 0.8721(3). Investigations are 
performed at values of (n, e) (16, 0.003), (32, 0.003), (64, 0.0022) and (100, 0.0022). 
At each of these values of (n, e) we have 0(50) configurations. 



31 



We apply the matrix Q"^ Pn^eiQ'^) , which should be close to the unit matrix for 
our choices of n and e, to a random Gaussian vector Gaipha,s{x), which is a complex 
vector, located at a lattice point (x) and carrying colour a = 1,2,3 and spinor 
s = 1, . . . , 4 indices. We then compute the vectors 

'^'Clenshaw = Q^Pn,e{Q'^)G , (5.13) 

where Pn,t{Q'^) is constructed via Clenshaw's recurrence relation and 

border = Q^PnAQ^)G , (5.14) 

where now P„,e(Q^) is evaluated using different root ordering schemes, and order 
stands for naive, pairing, bitreversal and Montvay. On a given configuration and 
for given G we finally determine 

A = ||$Clenshaw " border T • (5.15) 

Since the Clenshaw recurrence is the numerically most stable method to evalu- 
ate the Chebyshev polynomial, the values of A provide a measure for the effects of 



rounding errors. The result for A as a function of n is shown in Fig. ^.5| . Using the 
naive ordering scheme, we could not run the cases of = 64 and = 100 as we hit 
numerical overflows. For the pairing scheme we find large values for A at n = 64 
and n = 100. The bitreversal scheme gives small, but non- negligible values of A 
for all values of n used. Finally, Montvay's scheme gives A ^ 10~^ for all values 
of n. Surprisingly, using roots ordered by the Montvay scheme, the construction 
of the Chebyshev polynomial can be done with a stability that is comparable to 
the one using the Clenshaw recurrence. 
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Figure 5.1: The roots with k = 1, ... ,16 and e = 0.1 are shown in 
the complex plane. Labels of roots indicate in which order they are 
used for the numerical evaluation of the Chebyshev polynomial within 
each ordering scheme. We show in a) the naive, b) the pairing, c) the 
bitreversal and d) the Montvay root ordering scheme. 
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Figure 5.2: The ratio Rmax is shown as a function of the lower end Smin of the 
interval [smin, 1] from which the values of s are taken to compute i?max- Smin is 
measured in units of the parameter e. Wc show data for three degrees of the 
Chebyshcv polynomial n = 30, n = 86 and n = 146 at fixed approximation quality 
S = 0.001, using the bitreversal scheme. Although different in magnitude, the flat 
behaviour of i?max as a function of Smin is very similar for the other schemes. 
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Figure 5.3: Ratio i?max and maximum factor Mm^x are shown as a function of 
the degree of the polynomial at fixed approximation quality S — 0.1. We compare 
subpolynomial and bitreversal ordering schemes. 
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Figure 5.4: Ratio -Rmax and maximum factor Mmax are shown as a function of the 
degree of the polynomial at fixed approximation quality 5 — 0.001. We compare 
naive, pairing, bitreversal and Montvay ordering schemes. 
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Figure 5.5: The quantity A is shown as a function of the degree of the 
Chebyshev polynomial n. We compare the naive, pairing, bitreversal 
(br) and Montvay (M) ordering schemes. 
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Chapter 6 

The test: Physical results 



6.1 Consistency tests 



In general, the results of valid simulation algorithms must be independent of tech- 
nical simulation parameters, random number generator types, irrelevant boundary 
conditions and measurement details like source or noise types used for the corre- 
lators or fermion densities. 

To check whether we can rely on the results of the program package used to 
generate the CPU cost data of Ch. ^ we present various tests comparing observ- 
ables from different measurement strategies to analytical results, those from HMC 
code or other groups. 

Critical kappa 

The critical kappa was estimated using the peak in the number of Conjugent 
Gradient solver iterations needed to invert Q. We perform dynamical simulations 
using the LBA with acceptance step method I on 16 x 16 lattices with values of 



/3 = 2, /3 = 6 and (5 = 10. Results shown in Fig. ^]T| to ^]3| lead to estimated Kc of 
0.287 for (3 = 2.0, 0.268 for (3 = 6.0 and 0.264 for (3 = 10.0 with quite large errors 
of about 0.02. The estimated values are indicated via a vertical line in the figures. 
We thus obtain the expected behaviour of Kc ~ t for high /3 and an increase in Kc 



as one decreases the value of /3 described in Sec. p^ . 

An independent and usually better way to estimate the critical kappa is the 
extrapolation to the kappa value at which the pion mass is vanishing. Using 
dynamical data from 16 x 40 and 16 x 32 lattices respectively, we estimate from 
Fig. a value of 0.266 for (3 = 6.0 and from Fig. ^ a value of 0.262 at /? = 10.0 
with errors of about the same order of magnitude as above. 

Both independent measurement schemes thus yield the same numbers within 
errors. Moreover, results also agree with expectations. 

Noisy scheme tests 

We simulate full dynamical fermions on 4x4 lattices and j3 = 1.0 using the LBA 
with different approximation qualities and approximation regions. The acceptance 
step method I acceptances or respectively the reweighting factors are calculated 
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both in a noisy and exact way. 

Plaquette. We compare in Tab. |U?T| to plaquette averages to the analytic 
values obtained using a hopping parameter expansion and the exact plaquette for 
pure f/(l) gauge theory as described in App. p.4| and p.3 . 



All calculations compare simulations with parameter choices n = 0,2, 10 and 
e = 0.5, 0.01 to the leading order hopping parameter expansion result which is 
believed to be reasonable up to k ~ 0.15. We generally accumulated statistics of 
about 10^ sweeps. Only in the case of the acceptance step method I used with 
n = 0, where the Metropolis acceptance step is correcting for the full fermion 
determinant estimated stochastically, this was clearly not enough. We accumu- 
lated in this case statistics of up to 10^ sweeps with integrated autocorrelations of 
0(1000). 

The agreement of the simulation results is evident in all cases. Analytical 
results are reproduced up to k = 0.15. For larger values of k systematic deviations 
from the analytic results are encountered as expected. 

Correction factor. Checking the influence of the noisy scheme on the deter- 



mination of the correction factor we compare in Tab. ^3] factors calculated from 
eigenvalues to those estimated. We show results for the correction factor from the 
eigenvalues of and from 1000 noisy estimation steps. All calculations were done 
on a 4 X 4 lattice with f3 = 1.0, k = 0.1, e = 0.01 and different n. 

The results show the compatibility of the exact and the noisy estimation 
scheme. The n = 2 results nicely demonstrate the importance of importance 
sampling, i.e. that simulating an almost fiat distribution is impossible for dynam- 
ical fermions as corrections become too large. They also show that the correction 
factor can in fact deviate decisively from 1. 



Different noises. We compare in Tab. |6.6| Z2 and Z4 complex Ising variables 



used in the noisy estimators for fermion densities as described in App. [A.5| . We 



calculate the fermion condensate {i'i') and pseudo-scalar density (ip'j^ip) iii the 
quenched case on 16 x 16 lattices for f3 = 2.0 and k, = 0.26. 

Results nicely agree with each other. We remark that the imaginary part of 
the pseudoscalar density was found to be zero to about 10~^°. 

U = 1 tests 

For U = 1 the condensate can be calculated analytically as described in 
App. |B.1| . We show results for this basic test from runs with links fixed to f/ = 1 



on 16 X 16 lattices for a range of n values in Tab. |6.7| . They obviously agree with 
the analytic predictions. 

Condensate tests 

In order to check the routines, a comparison to completely independent results 
is desirable. 

For the fermion condensate in dynamical simulations the group in Graz ob- 
tained data for the 2 flavour Schwinger model with Wilson fermions using a HMC 
update algorithm on 16 x 16 lattices at /5 = 2.0 [|Lan96 |. 
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In Fig. |6.4| we show the comparison of LBA with acceptance step method I 
data to these independent results. The agreement is evident. 

Random generator tests 

As mentioned in Ch. ^, we throughout this study used the high-quahty random 
number generator (RG) by M. Liischer [ Lue93|| to ensure that results are indepen- 



dent from random numbers. To further verify this, we compare quenched results 
on 16 X 16 lattices at /? = 10 obtained using a simple vectorized XOR random num- 
ber generator (XOR RG) ||Pre92|| and the vectorized Liischer generator (Liischer 



RG). In both cases periodic boundary conditions (BCs) are used. 

In Tab ^]8| and Tab we compare the condensate and pseudo-scalar density 
results to each other. These results clearly show that agreement is almost perfect 
up to the critical kappa ~ |. Both the condensate and pseudoscalar density 
values above Kc are almost meaningless, as errorbars are of the order of the results 
themselves. Still, even those results are consistent with each other. 

We remark that the pseudoscalar density results for large k, do not agree with 
zero as was expected. This effect is due to topological effects discussed in Ch. |^. 
Still, pseudoscalar results are smaller than the fermion condensate by orders of 
magnitude. 

Boundary conditions tests 

In a gauge f/(l) theory we expect no influence of periodic or anti-periodic 
boundary conditions as —1 is in the centre of the group. To check this, we repeat in 
Tab. |6.11| and |6.10| the periodic boundary condition data discussed in the paragraph 
above with anti-periodic boundary conditions. 

Agreement of results with those of the periodic boundary conditions runs is 
evident in all cases. 

Pion mass 

We check the implementation of the dynamical fermion update comparing mass 
results with dynamical fermions against an independent calculation. Using a Hy- 
brid Monte Carlo code, Irving [[lrv96|] gives a pseudoscalar vector (pion) mass 



of = 0.369(3) for a 32 x 32 lattice with (3 = 2.29, k = 0.26. We obtain 
m,r = 0.377(4) from about 2000 measurements using acceptance step method I, 
thus agreeing with high precision with the independent value. 

Symmetry tests 

We test that the alternative meson operators in the various channels give com- 
patible results. 

Quenched. Meson masses in the quenched approximation were obtained on 
16 X 32 lattices for (3 = 6.0 and k = 0.20 . . . 0.275 performing high-statistics 
runs with about 2000 independent measurements each. Fig. |6.5| shows the clear 
signal for the pion mass decreasing as k is increased, its alternative operator gives 
masses which agree within errors and is therefore not depicted in the plot. For the 
7] and its variant, consistency is also verified with larger errors. The higher states 
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unfortunately give noisy results. 

Dynamical. Masses were determined on 16 x 32 lattices for (3 = 10.0, k, = 
0.20 . . . 0.25 and are shown in Fig. We observe the same symmetry behaviour 
as in the quenched case, with the alternative operators for the eta and pion masses 
again agreeing within errorbars. Due to the increased complexity of the compu- 
tations, the data at large k and for higher states is very noisy compared to the 
quenched case. 



6.2 Choice of parameters 

We want to simulate at relevant physical parameter choices, in optimal cases un- 
derstanding the scaling towards the continuum and chiral limits. To justify the 
choice of parameters for the CPU cost studies, we show that finite size effects do 
not affect the mass results at the chosen lattice sizes, /? and k values. In a second 
step, we demonstrate that the simulation parameters chosen on our larger lattices 
correspond to the same physical situation with a lattice spacing which is smaller 
by a factor of about 1.6. 

Finite size effects 

In order to check the influence of the finite lattice extent, we regard the meson 
mass spectrum on small 8 x 20 lattices. Naively, one would expect that masses 
can be obtained in the region 0.5 < m < 2. 

We simulate on 8 x 20 lattices with a very conservative (3 = 3.0 because of the 



topological metastability problem for high P values as mentioned in Sec. p.2| and a 
run length of generally > lOOOr. The topological problems are discussed in more 
detail in Ch. ^ 

As can be seen in Fig. |6.7| , finite size effects are small up to a k value of about 
0.24. For larger k, a deviation from the approximately linear behaviour of the pion 
mass is detectable. 

To justify the linear fits for the pion mass, we included in Fig. also a fit 

2 

assuming the ma behaviour suggested by perturbation theory. It is evident that 
this describes the data far worse than the linear fit. We are obviously not yet in 
the regime where the leading order result is applicable. 

This results in a minimal pion mass m^(8 x 20) = 0.629 possible on this lattices 
and a mass ratio — = 0.807. These parameter choices are used for the CPU cost 
tests in Ch. || 

The scaling procedure described in Sec. ^]2| relies on the possibility to determine 
for a certain (3 the appropriate pion mass or k value corresponding to a fixed ratio 



This is illustrated in Fig. The data shows a clear dependency on the pion 
mass even taking the errorbars into account. A linear fit and an inversion of the 
functional dependency is thus feasible. 
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Scaling to larger lattices 

To achieve scaling towards the continuum limit, we aim to to reduce the lattice 
spacing going from (3 = 3.0 and lattices of size 8 x 20 to /? = 5.0 and appropriate 



larger lattices as described in Sec. 2.2 



We limit ourselves to these values of (3, as for higher /3 values topological 
metastabilities contaminate the data as discussed in Ch. ^ Mass spectra and fits 
for /3 = 4, 5, 6 on 16 X 40 lattices are shown in Fig. to |6.11| . This sequence of 



P values is included to show the deterioration of the data going to higher /3. 

We have to determine the pion mass or respectively the k value corresponding 
to the same mass ratio — as described above. To illustrate this, we include in 



Fig. |6.12| a plot of — versus the pion mass. The lower quality compared to the 



8 X 20 results of Fig. |6.8| is an indication how fast CPU costs become prohibitively 
large for high-precision studies at larger lattices. 

We fix the parameters for the optimal CPU cost search runs at (3 = 5.0 and 
K = 0.245, yielding a pion mass of m^(16 x 40) = 0.384 and a ratio of ^ = 0.826. 
As this ratio is compatible with the one obtained on the smaller 8 x 20 lattices, we 
thus achieve a scaling by a factor of ^'"(^fgxTo) ~ 1-64. Finite size effects are again 
small on the lattices we used as the pion correlation length is about 3. 
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Figure 6.1: Number of CG iterations. Wc show as a function of k the number 
of iterations necessary to invert Q to a precision of 10^® using a Conjugate Gradient 
inverter on 16 x 16 lattices for /3 = 2.0. The estimated Kc is indicated by a vertical 
line. 
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Table 6.1: Plaquette with exact correction factor. We show the dynamical 
average plaquette value on 4 x 4 lattices for P — 1.0. Columns 4 and 5 calculated 
with e = 0.5, columns 6 and 7 with e = 0.01. 
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Table 6.2: Plaquette with noisy correction factor. We show the dynamical 
average plaquette value on 4 x 4 lattices for P — 1.0. Columns 4 and 5 calculated 
with e = 0.5, columns 6 and 7 with e = 0.01. 
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Figure 6.2: Number of CG iterations. Wc show as a function of k the number 
of iterations necessary to invert Q to a precision of 10^® using a Conjugate Gradient 
inverter on 16 x 16 lattices for /3 = 6.0. The estimated Kc is indicated by a vertical 
line. 
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Table 6.3: Plaquette with exact acceptance step. We show the dynamical 
average plaquette value on 4 x 4 lattices for P — 1.0. Columns 4 and 5 calculated 
with e = 0.5, columns 6 and 7 with e = 0.01. 
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Table 6.4: Plaquette with noisy acceptance step method I. We show the 
dynamical average plaquette value on 4 x 4 lattices for /3 = 1.0. Columns 4 and 5 
calculated with e = 0.5, columns 6 and 7 with e = 0.01. 
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Figure 6.3: Number of CG iterations. We show as a function of k, the number 
of iterations necessary to invert Q to a precision of 10~^ using a Conjugate Gradient 
inverter on 16 x 16 lattices for /3 = 10.0. The estimated Kc is indicated by a vertical 
line. 



300 



250 



200 



150 



100 



50 



I I I I I I I 



- 



I I I 



I I I 



I I I I I I I I 
o 



- 



0.20 0.22 0.24 0.26 0.28 0.30 



K 



Table 6.5: Correction factor. We show the average correction factor det[l — R] 
calculated exactly from the eigenvalues of and stochastically estimated from 
dynamical simulations on 4 x 4 lattices for (3 = 1.0 and «; = 0.1 using generally 
e = 0.01 and different n. 
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Table 6.6: Ising Random Number Test. We show the quenched average 
fermion condensate and pseudoscalar density on 16 x 16 lattices for /3 = 2.0 and 
K — 0.26 using different Ising noises with Z4 or Z2 symmetry. 
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Table 6.7: U = 1 Condensate. We show the average fermion condensate from 
U — 1 simulations on 16 x 16 lattices, compared to analytic results. 
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Table 6.8: Condensate — periodic BCs. We show the quenched fermion 
condensate on 16 x 16 lattices at P — 10.0 comparing results from Liischer RG to 
those of XOR RG. 
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Figure 6.4: Fermion condensate. We show as a function of n the dynamical 
fermion condensate, calculated on 16 x 16 lattices at /3 = 2.0. 
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Table 6.9: Pseudoscalar density — periodic BCs. We show the quenched 
pseudoscalar density on 16 x 16 lattices at /9 = 10.0, comparing results from 
Liischer RG to those of XOR RG. 
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Table 6.10: Condensate - anti-periodic BCs. We show the quenched fermion 
condensate on 16 x 16 lattices at /3 = 10.0, comparing results from Liischer RG to 
those of XOR RG. 
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Table 6.11: Pseudoscalar densities — anti-periodic BCs. Wc show the 
quenched pseudoscalar density on 16 x 16 lattices at (3 = 10.0, comparing results 
from Liischer RG to those of XOR RG. 
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Figure 6.5: Meson mass spectrum. We show quenched meson masses as a 
function of k calculated on a 16 x 32 lattice at /3 = 6.0. Included are pion masses 
from the operators ipj^rip and 'ip^^^^Tip, eta masses from 'ip'j^ip and ■07^7°'0 and 
aO masses from iprip. Both pion operators give identical results. 
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Figure 6.6: Meson mass spectrum. We show dynamical meson masses as a 
function of k calculated on a 16 x 32 lattice at (3 — 10.0. Included are pion masses 
from the operators ip'-f^rip and ip'-f^'-f^rip, eta masses from ip'-f^ip and ip'-f^^f^ip and 
aO masses from iprip. We give just one pion mass if results are within errorbars. 
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Figure 6.7: Meson mass spectrum. We show dynamical meson masses as a 
function of k calculated on a 8 x 20 lattice at /3 = 3. 
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Figure 6.8: Mass ratio. We show the dynamical pi/eta mass ratio as a function 
of the pion mass, calculated on a 8 x 20 lattice at (3 — ?>. 
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Figure 6.9: Meson mass spectrum. We show dynamical meson masses as a 
function of k, calculated on a 16 x 40 lattice at (3 — A. 
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Figure 6.10: Meson mass spectrum. We show dynamical meson masses as a 
function of k, calculated on a 16 x 40 lattice at P — 5. 
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Figure 6.11: Meson mass spectrum. We show dynamical meson masses as a 
function of k, calculated on a 16 x 40 lattice at /3 = 6. 
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Figure 6.12: Mass ratio. We show the dynamical pi/eta mass ratio as a function 
of the pion mass, calculated on a 16 x 40 lattice at /3 = 5. 
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Chapter 7 

The problems II: Topology 



Constructing decorrelated configurations in lattice simulations can be difficult if 
large energy barriers exist between regions of the configuration space to be sam- 
pled. Unfortunately, one such possibility are topological sectors for the U{1) model 
|Gat95|] , part of the Schwinger model [|Joo90i |Gat97ai [Dil93 | we are considering in 



this study. 

While local updates do not work well, we can use in two dimensions a global 
heatbath update to obtain reasonably high tunnelling rates. This trick obviously 
works also in the quenched case, while for full dynamical simulations global heat- 
bath updates are not known. 

A possible way of dealing with such systems which has been suggested | |Gut97| 
is to stay in one fixed topological charge sector and define quantities in that way. 
In order to study this for relevant observables, we measure pion correlators in 
fixed (low) topological sectors. In the quenched case we compare local and global 
update schemes. For dynamical fermions we compare low and high f3 results, using 
both the local bosonic algorithm (LB A) ||Lue94 ] and Hybrid Monte Carlo (HMC) 
Dua87||. 



7.1 Algorithm 

Quenched case: local update. We use a local link update consisting of one exact 



heatbath ||Bes79|| and three over-relaxation steps per trajectory. 



Quenched case: global update. Alternatively, it is possible to use a global heat- 
bath for the plaquettes. As the new configuration is constructed without recursion 
to the old one, there are no problems with metastabilities. 

The global update uses the fact that nearly all plaquettes are independent even 
on a finite lattice. The plaquettes only have to satisfy the constraint 

I[Up = l. (7.1) 
p 

We are therefore able to update LT — 1 plaquettes with a heatbath algorithm. The 
last plaquette is then determined by the condition Eq. |7]l| and the configuration 
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has to be accepted or rejected according to a Metropolis decision to ensure the 
correct distribution. Finally, this plaquette configuration has to be translated into 
a valid link configuration. To achieve this, we utilise the freedom to choose a 
gauge. We use a maximal tree prescription, setting LT — 2 links to 1. Then LT 
links can be recursively determined from the plaquettes. The unconstrained two 
remaining links correspond to the free Polyakov loops Pt and Pl in 2 dimensions. 

We want to state that the problem of slow topological charge fiuctuations could 
also be solved by explicit topological updates |pil93| ]. Unfortunately, this trick is 
not applicable in the presence of dynamical fermions. Improvements claimed to 
increase tunnelling for staggered fermions ||Dil93|| did not work for the Wilson 
fermions used in this study. Whether a recently suggested reweighting method 
to enhance topological updates |For97c|| really reduces the problem is still under 
discussion. 

Full dynamical simulations. In this case we use the Hermitean version of Lii- 
scher's local bosonic algorithm with noisy acceptance step method I as described 
in Ch. ^ To have an independent check, we compare to a Hybrid Monte Carlo 
algorithm implemented as described in Ch. ^. 



7.2 Topological sectors 

The integer-valued topological charge functional 



|- / (i'xe^^F^, (7.2) 



can be represented on the lattice by 



ZTT p 



with plaquette angle (j)p = lmln{Up) G (— tt, tt) ||Lue82|| . We denote by tunnel 



events all updates resulting in a change of the topological charge and as tunnel 
probability the number of tunnel events divided by the total number of updates. 
The topological susceptibility is defined via 

Xtop = ^[<Q'>-<g>']. (7.4) 

We demonstrate the relevance of topological sectors on a 16 x 32 lattice showing 
in Fig. |7]l| the tunnel rate plotted against (3 for local and global updates. The 
exponential decrease of the tunnel probability for this local update algorithm gives 
rise to metastabilities in simulations at large (3. We therefore consider it worthwhile 
to investigate observables within fixed topological sectors. 
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7.3 Simulations 



Simulation parameters. We simulate on 8 x 20 and 16 x 40 lattices at a beta value of 
/? = 12 generating 10000 configurations. We only show data for the larger 16 x 40 
lattices. For the fermion part, we choose k = 0.24, where the pion correlation 
length (in the quenched case) is found to be around 3 and finite size effects can be 
expected to be small. 

Local updates. We monitor the topological charge during the simulations. Due 
to metastability, no tunnelling of Q is observed in the runs with local updates. 
At /? = 12 we therefore are able to perform a simulation in a given topological 
sector by using an initial configuration with this particular charge. This is done 
generating a classical homogeneous plaquette configuration of the desired charge 



and converting this to the links as described in Sec. |7.1| . The two Polyakov loops 
Pt, Pl are chosen according to a flat random distribution. 

Global updates. In the limit of independent plaquettes the topological suscep- 



tibility can be calculated analytically as shown in App. [B.2| . The result in the 
large /3 approximation for /5 = 12 is 

Xtop|,3=i2 ~ ^\P=12 = 0.0022 . (7.5) 

To check the global update, we simulate on a 16 x 32 lattice obtaining Xtop = 
0.0021(1). This shows that the global update works well without metastabilities 
in the topological charge. 

Observables. We generally measure the pion correlator using the prescription 
detailed in App. |A.5| . 

Quenched results 

The results of the quenched runs are shown in Fig. [7.2| . Obviously, for (5 = 0,4 
we are not able to find a plateau in the effective mass plots. We find a valley-like 
structure in the mass for the low Q cases. For high Q this turns into a hill-like 
structure with the peak situated at half of the temporal lattice extent. To show 
that in the intermediate region the valley and hill structures can approximately 
cancel and suggest a fake plateau, we include the Q = 1 plot. 

In the quenched case, we are able to compare to the results using a global 
update scheme. For the global update we find a tunnel probability of P = 0.76 
and therefore do not expect any influence of topology. The effective mass is shown 



in the lower right plot of Fig. \l.2l A plateau is clearly more reasonable than in 
the fixed Q cases. 

Dynamical fermion results 

Effective masses from full dynamical simulations are shown in Fig. |7.3| . We 
do not expect quantitatively the same results as in the quenched case, yet the 
behaviour is qualitatively similar. 

To establish that dynamical results are not influenced by the chosen parameters 
of the local bosonic algorithm, we repeat the calculation with topological charge 
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Q = A using a standard HMC algorithm. This is also shown in Fig. |7.3| . The 
results agree nicely within errorbars. 

From these results, we conclude that we need to average over the topological 
sectors to obtain a plateau in the effective mass. 

7.4 Projections to topological sectors 

To gain further insight, we now use a slightly different approach. In principle, we 
could also restrict ourselves to definite topological sectors by selecting measure- 
ments with fixed topological charge from a simulation, i.e. effectively simulating 
the path integral given by 

Z[q] = I D[U]D[^P]D[i;]SQ,,e~' . (7.6) 

To this end we need simulations with a reasonably high fluctuation. Such simula- 
tions can be done e.g. with dynamical simulations at low (3 or quenched simulations 
using global updates. 

Full dynamical case. We work at low P = 1 with a slightly smaller k = 0.22. 



Effective masses are depicted in Fig. 7.4. We can detect no discrepancy between 



masses calculated in different topological sectors. This result was also reported by 
a group working with staggered fermions, which concluded that topological sectors 
are of no importance to mass estimates in the Schwinger model [ |Gut97 . 



Quenched case. Here we exploit the opportunity to use the same parameters 



f3 = 12, K = 0.24 as in Sec. Ol The results are shown in Fig. fTS]. At this f3, we 



do not find agreement. Rather the effective masses are nearly the same as in the 



simulations without tunnelling presented in Fig. |7.3| . We would like to point out 
that they do not agree within errorbars. On the other hand, we remark that the 
statistical sample was very much smaller for the projected data due to the fact 
that only a part of the generated configurations is projected into the appropriate 
sectors. 

The striking difference between low and high (3 results makes a sound under- 
standing of this behaviour highly desirable. As can be seen from the quenched 
results here, it is evidently not just the averaging over the topological sectors (as 
found in Sec. [7.3[ ) which is lacking in high (3 simulations. There seems to be some 
more subtle dynamical effect involved. 

7.5 External gauge configurations 



In order to investigate how much averaging is necessary, we plot in Fig. |7^ effective 
masses for external configurations with fixed topological charge Q = and Q = 
4. These are generated from homogeneous plaquette configurations in the way 
described in Sec. [7.1| . For the fermions we use k = 0.24. 

We stress that we use random Polyakov loops Pt, Pl in both cases, so that 
one should not expect free fermion behaviour in the Q = case. For Q = 4 a 
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Figure 7.1: Tunnel probability. Topological tunnel probability of pure U{1) as 
a function of f3 using local and global updates on a 16 x 16 lattice. 
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completely irregular behaviour is observed. It is thus not possible to measure a 
meaningful pion correlator from one (even very smooth) configuration alone. The 
translation invariance is manifestly broken even for that smooth configuration. 

The result of averaging over 10 values of Pt and Pl is shown in Fig. [7.7| . We 
clearly regain the qualitatively expected regular valley and hill structure observed 
in Fig. |7.3| in both the Q = and Q = 4 case. 
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Figure 7.2: Effective pion iriciss. Quenched efTective pion mass as a function 
of time for (3 = 12.0, k = 0.24. 
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Figure 7.3: Effective pion mass. Dynamical effective pion mass as a function 
of time for (3 = 12.0, k = 0.24. 
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Figure 7.4: Effective pion mass. Dynamical projected effective pion mass as a 
function of time for (5 — 1.0, n — 0.22. 
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Figure 7.5: Effective pion mass. Quenched projected effective pion mass as a 
function of time for (5 — 12.0, n — 0.24. 
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Figure 7.6: Effective pion mass. Shown as a function of time from one external 
configuration at k = 0.24. 
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Figure 7.7: Effective pion mass. Shown as a function of time from 10 external 
configurations at k = 0.24. 
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Chapter 8 

The aim: CPU cost optimisation 



We remind the reader of the schemes included in our investigation. Besides 

• Hybrid Monte Carlo (HMC) to set the scale, 
we compare 

• LBA with reweighting, 

• LBA with acceptance step method I , 

• LBA with acceptance step method I with adapted precision and 

• LBA with acceptance step method II 

as described in Ch. ^, investigating the CPU cost behaviour changing n, e and the 
number of reflections for the simulation runs. 

We would like to point out again that reweighting and acceptance step algo- 
rithms result in two different ensembles, so that one has to use the effective CPU 



cost Ceff = A^totai Q ops " defined in Sec. 



8.1 8 X 20 lattices 

Our search for optimal CPU cost behaviour is conducted on 8 x 20 lattices with 
parameters (3 = 3.0 and k = 0.24 measuring costs for the average plaquette and 
the pion correlator at distance At = 3. 

We generally use anti-periodic boundary conditions and 5 hits for the link heat- 
bath algorithm to provide high acceptance rates. To ensure numerical stability, we 
apply the bitreversal scheme for ordering of roots as described in Ch. ^]2| whenever 
a Chebyshev polynomial in the factorized form was necessary. The rescaling factor 
Cm (viz. App. |A.4| , |[Lue94|| ) is set to a conservative cm = 1.02, so that there is 
no slowing down from the large eigenvalues of Q^. The precision of the inverters 
demanded is generally 10~^, with 10~^ for the reduced precision of the adapted 
precision scheme. We accumulated statistics of 10000 calculation sweeps, applying 
1000 sweeps thermalization with integrated autocorrelations generally below 50. 



To illustrate our search for optimal parameters, we depict in Fig. |8.1j and 
Fig. |8.2| the CPU cost in the n — e plane, where the number of reflections is 
optimised for each n, e pair. The figure clearly shows that we obtain fiat optima. 



In Tab. ^.1| we show the CPU cost for the Hybrid Monte Carlo algorithm run 
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with different acceptance probability. This illustrates that the cost is not very 
dependent on the acceptance. To set the scale, we take the optimal values for an 
acceptance of 0.84. 

We in Tab. compare the CPU cost of only the optimal parameter sets of 
the various LBA variants with the HMC scale. 

We find that the LBA is doing better than the HMC by a factor of about 3 
for the plaquette and about 2 for the pion correlator at distance At = 3. Thus 
the gain in the plaquette is typically better than that in the pion correlator. The 
reweighting method is a bit worse than the optimised acceptance step methods, 
losing for both observables by about a factor of 1.3. The gain using the adapted 
precision trick is quite considerable for the pion correlator (1.6), while it is even a 
slight detoriation for the plaquette as the overhead is too large. We remark that the 
configurations found for the optimised and unoptimised acceptance step method 
I scheme runs are not identical as one might expect in the case of no numerical 
errors. Overall, the number of reflections is important to the optimisation. A 
fixed number of 1 would not have reproduced the real optima. This leaves room 
for improvement in simulations where this has not yet been utilised. 



8.2 16 X 40 lattices 

For the larger lattices we choose the parameter values /5 = 5.0 and k, = 0.245 on 
16 X 40 lattices as detailed in Sec. |6.2| . Technical simulation parameters are the 
same as described in Sec. ^TT[ 

We again illustrate our search for optimal parameters in Fig. F]3| and Fig. ^.4|, 



showing the CPU cost in the n — e plane, where the number of reflections is 
optimised for each n, e pair. The figure shows that although optima are fairly flat, 
we do not find as flat behaviour as for the smaller lattices. We admit, though, 
that somewhat larger n should have been included in the study. 

In Tab. |8.3| we give the CPU costs for the Hybrid Monte Carlo algorithm run 



with different acceptance probability. As we did not study this as exhaustively as 
in the smaller lattices case, we take the optimal values from all HMC runs to set 
the scale. 



We in table |8]J again compare CPU cost of the optimal parameter sets of the 
various LBA variants with the HMC scale. 

We find that the LBA is again doing better than the HMC by a factor of about 
3 for the plaquette and about 2.5 for the pion correlator at distance At = 3. Thus 
the gain in the plaquette is again better than that in the pion correlator, but not 
by as large a margin as for the smaller lattices. The reweighting method is again 
less efficient than the optimised acceptance step methods, this time even more 
so than for the smaller lattices, losing for both observables by about a factor of 
1.8. The gain using the adapted precision trick is negligible for both observables. 
Overall, the number of reflections is not as important to the optimisation as for 
the small lattices. 
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Table 8.1: HMC cost. Simulations at /3 = 3.0, k = 0.24 on 8 x 20 lattices. 
We vary the number of trajectory steps ritr, while holding the trajectory length 
ritr ■ At constant. 
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Table 8.2: LBA CPU cost minina. Simulations at /3 = 3.0, k = 0.24 on 8 x 20 
lattices. 
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Table 8.3: HMC cost. Simulations at /5 = 5.0, k = 0.245 on 16 x 40 lattices. 
We vary the number of trajectory steps ritr, while holding the trajectory length 
ritr ■ At constant. 
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Table 8.4: LBA CPU cost minina. Simulations at /3 = 5.0, k, = 0.245 on 
16 X 40 lattices. 
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Chapter 9 



Conclusions 

The massive two-flavour Schwinger model has physical properties similar to QCD 
in four dimensions. It is much easier to simulate even with dynamical fermions 
and allows to determine observables with high precision as the cost will be lower 
by orders of magnitude as compared to the case of QCD in four dimensions. This 
makes the Schwinger model a reasonable testing ground for dynamical fermion 
algorithms. Still, in the case of the local bosonic algorithm, this study is the first 
application using Wilson fermions. 

On the other hand, due to problems with topological sectors (as discussed 
below) and the scaling properties towards the continuum limit, it is definitely not 
that easy a toy model as one might expect. 

9.1 Instabilities 

In a class of fermion simulation algorithms relying on the local bosonic algorithm 
(LBA) a matrix valued Chebyshev polynomial is involved. Recursion relations 
allow the evaluation of these polynomials in a numerically stable way. Yet in 
a number of cases, like our implementation of the Acceptance step method I in 
the Hermitean LBA or the Polynomial Hybrid Monte Carlo (PHMC) algorithm 
||t're97a|| , the polynomial is needed in the factorized form. Then rounding errors can 
easily lead to significant precision losses and even numerical instabilities, especially 
if simulations are done on machines having only 32-bit floating point arithmetic 
precision. 

We investigate the effects of using various ordering schemes of monomial fac- 
tors, or equivalently the complex roots, on the numerical construction of the 
Chebyshev polynomials now commonly used for the LBA. We find that differ- 
ent ordering schemes for the roots can lead to rounding error effects ranging from 
numerical overflow to retaining a precision comparable to the one numerically 
stable recurrence relations can provide. 

In the case of a Chebyshev polynomial of a single real variable s approximating 
the function 1/s, we find that the bitreversal scheme and a scheme suggested by 
Montvay can keep rounding error effects to a low level for degrees of the Chebyshev 
polynomial up to n ~ 220. 



70 



Applying these reordering schemes to the evaluation of a matrix valued polyno- 
mial, we study numerical simulations of 4-dimensional lattice QCD. We find that 
Montvay's ordering scheme of the roots seems to be particularly suited for this 
problem. The rounding errors could be kept on a level which is comparable to the 
one that is reached when using the stable Clenshaw recurrence relation. 

We conclude that the precision with which the numerical evaluation of the 
Chebyshev polynomial can be performed depends strongly on the chosen ordering 
of the roots. We expect also severe consequences for the dynamics of the simulation 
algorithms where Chebyshev polynomials in the product representation are used, 
depending on the root ordering scheme employed. 

As the most important outcome of this investigation, we consider that there 
exist orderings of the roots which allow a numerically very stable evaluation of a 
Chebyshev polynomial, even up to degrees n of the polynomial of about 200. Since 
these values of n correspond to degrees of the Chebyshev polynomials commonly 
used in simulations, we consider our findings as promising for future applications 
of the local bosonic algorithm. 

9.2 Topology 

During these studies we encountered simulations at high (3 (~ 8) exhibiting me- 
tastabilities in the topological charge. For dynamical simulations, no cure to this 
problem is known. 

Motivated by this, we studied simulations at extreme (5 values (~ 12) which 
are effectively at fixed charge as no tunnelling occurs within these runs. Results 
for effective pion masses show that a definition of a pion mass from a plateau 
is not possible for either quenched or dynamical simulations even for vanishing 
topological charge. 

A comparison with quenched global updates exhibiting no metastabilities demon- 
strates that a plateau can be found in the correct path integral sample. Further- 
more, cross-checks against the Hybrid Monte Carlo algorithm (HMC) indicate that 
this problem is not an artifact stemming from the fermion algorithm. 

To gain insight into the mechanism, we studied effective pion masses calculated 
from projections to fixed topological sectors. Results from fluctuating ensembles 
show no dependence on the topological charge. On the other hand, those projected 
from fluctuating quenched ensembles at high (3 show approximately the same prob- 
lematic behaviour as simulations completely without tunnelling. This suggests a 
rather subtle dynamical effect we do not understand yet. 

We flnd that external homogeneous plaquette conflgurations with flxed Polyakov 
loop values Pt and exhibit completely irregular behaviour. After averaging over 
Pt and Pl wc regain the effective mass results characteristic for the topological 
charge sectors these conflgurations lie in. 

We conclude that there is a need to obtain a better understanding of the 
interplay between the dynamical mass generation of mesonic states and topological 
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sectors. We would like to point out that higher statistics runs could reveal similar 
phenomena in other models with non-trivial topological structure. 



9.3 CPU cost optimisation 

We optimised the CPU cost of the local bosonic algorithm varying 3 technical 
parameters of the algorithm, namely the number of over-relaxation steps in each 
trajectory, the order of the approximating Chebyshev polynomial n and the lower 
cut of the approximation region e. We simulated at two different lattice sizes 8 x 20 
and 16 x 40 keeping the physical mass ratio ^ = 0.81 approximately constant. 
This resulted in pion masses of about m.,^ = 0.629 and ttItt = 0.384, or equivalently 
in a scaling of the lattice spacing of about 1.6. 

The tuning of the LBA is demonstrated to be surprisingly easy. It became 
more difficult for the larger lattices, though. Technically, the number of reflections 
per heatbath update, usually fixed in other studies of the algorithm, is found to 
be an important optimisation tool. 

The CPU cost can be lower than for HMC, but not by a large factor with 
present techniques. We find a gain for the plaquette of approximately 3 for both 
lattice sizes and for the pion propagator of ~ 2 on the smaller and ~ 2.5 on the 
larger lattices. The gain thus differs and estimates from plaquette-like observables 
are too optimistic. Still, our main point is that the gain is detectable and consistent 
for both lattice sizes. 

We also demonstrate that using a noisy Metropolis acceptance step scheme 
to make the LBA exact is also possible for the Hermitean case. The use of the 
Gegenbauer solver, which avoids instabilities in the evaluation of the polynomial, 
in the method labelled as acceptance step method II, is shown to be competitive 
to CG in the first real simulation. Costs are virtually identical to those of the 
acceptance step method I for both lattice sizes. 

In general, all investigated acceptance variants performed similarly. The re- 
weighting method, though, had decisively higher costs. Especially for the larger 
lattices it performed worse than the acceptance step methods for both observables 
studied by a factor of 1.8 (1.3 for the smaller lattices). Still, as the difference 
is not really large, the possible advantages of this method regarding exceptional 
configurations ||I're97|| make further study worthwhile. 



The optimisation of the acceptance step method I, interrupting the solver iter- 
ations and checking at intermediate steps whether the solution quality was already 
good enough for the requested Metropolis decision, did result in a significant gain 
of a factor of 1.6 compared to the unoptimised version regarding the cost of the 
pion correlator on the smaller lattices. On the larger lattices, though, we did not 
find any improvement. As this is nevertheless an optimisation offering possible 
gain, future studies should have this trick in mind as the gain could be different 
for larger inverter trajectories. 

We conclude that these results give further evidence that the LBA is a com- 
petitive algorithm for the simulation of dynamical fermions. 
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Appendix A 



Conventions 



A.l General conventions 

Dimensions. We use the standard conventions 

h = c^l. (A.l) 

In d — 2 dimensions then follow 

[p] = [e] = [m] 

[l] = [t] = [m]-'. (A.2) 

The inverse lattice coupling P = -J^ and the Wilson parameter k — 2{ma+d) 
dimensionless. Gauge links U^^^ are dimensionless, while Grassmann fermion fields 
-0 have the dimension [mjz. In the electromagnetic tensor 

F-=[l f) (A.3) 

in 1+1 dimensions only the electric field E appears, as because of the missing 
transverse directions no magnetic field is existing. The electric field has the same 
dimension as the charge. 
Indices. We denote 

flavour indices with Latin lower case letters usually starting with a, 

lattice sites with Latin lower case letters usually starting with x, 

spinor indices with Latin lower case letters usually starting with s. 

Directions we abbreviate with Greek lower case letters i/, 1, 2, . . . 

and use barred letters for the orthogonal direction (only one in 2D) /i, P, 1, 2, . . . . 

Generally, 1 signifies the spatial direction, 2 the temporal. 

Unit vectors are denoted by a hat /i, /i, 1, 1, . . .. 

Trace conventions. The different traces are 

TR = ^ ; i.e. sum over sites, Dirac and flavour indices, 

x,s,a 
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Tr = ^ ; i.e. sum over sites and Dirac indices, 

x,s 

tr = ^ ; i.e. sum over Dirac indices, 

s 

trf = ^ ; i.e. sum over flavour indices. . (A.4) 

a 

In an update trajectory, new variables and configurations are primed, the old ones 
unprimed. 

Links. We use compact U{1) link variables U^^fM- Plaquettes are defined starting 
at the lower left corner going anti-clockwise 

Up. = Ul^Ul^^^^U^^^^^^U^,,. (A.5) 

Plaquette angles (px £ [— tt, tt] are defined via 

e"f'^ = Upx. (A.6) 

Polyakov loops Pt, Pl are used averaged over the orthogonal direction, i.e. 

PL = ^jl n and PT = \jz n . (A.7) 

X2=l a;i=l xi=lx2=l 

Fermions . Spinors are denoted with Greek letters ip^^s throughout. 

A. 2 Gamma matrices 

We use Hermitean gamma matrices 



and a Hermitean 7^ 

7^ = .7V=(^ ;) (A.9) 

with the relations 

{7^ 1"] = 26^'\ {7^ 7^'} = 0, 7'7V = -Y ■ (A.IO) 

Note that 

7^(1 ± 7^)7^(1 ± 7^^) = 1 ± 7V7^ ± 7^ ± 7V7V = , (A.ll) 



i.e. that the matrix (Eq. |A.26| ) does not connect sites with two straight links 



in-between. For general purposes we introduce the set 

r e f = {1,(Z7^),7°,(Z7^)} (A.12) 

chosen so that 

{ipT^y = (t/jYtfj) Vr . (A.13) 



74 



A. 3 Flavour matrices 



The flavour matrices set is 

T G f = {r+,r-,l,r°}, (A.14) 
where the tau matrices are derived from the Pauli matrices given by 

via 

r+ = -^ir' + ^r% = ^{r' - zr') . (A.16) 

The Hermiticity relations 

1^ = 1, r°t = r°, r+^ = r- (A.17) 
hold. Further useful relations are 

trf(r") = , trf(r"V") = 2 , trf(l) = 2 (A.18) 

and 

^trf(r'^V") = 3trf(r''V'') V6 . (A.19) 

a 

A. 4 Fermion matrix 

Starting from the standard Wilson fermion matrix 

M = 5,,y - + inU.-f.,, + - inUi,) (A.20) 

with the Wilson kappa definition 

2(m + a) 

we define via 

M =1-kH (A.22) 

the hopping matrix H. The abbreviation G for the Greens function M~^5fiavour is 
used throughout. We construct a Hermitean matrix Q scaled such that eigenvalues 
lay in [-1,1] fLuel: 



via 



Q = Cor'S.,y - Co/c ^(4-A..7'(l + rW.-f.,, + 5.+A,y7'(l - ^^1,) (A.23) 
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with the scahng factor 

'm + d 1 1 1 /Ao.\ 

^0 = —r^— = T—^ ' c^^l- (A-24) 

m + zacM I + ZcLkcm 

Inserting exphcit gamma matrices, this results in the programmed formulae for Q 

{Q4>)x,l = + ^Co0x,2 

- icoKUli{(f)x+i2 - (Px+u) 

{Q<P)x,2 = ~ ^Co0^,l 

+ 2icoKt/^._o,o0a-o,i 

+ ^co/€f/,_i,(0,_i, +0,_io) (A.25) 



and 



( (l-7'^)(l + 7^') Ux-f,,^Ux- 

+ Sx+f^-f^',. (l + 7^)(l + 7^') t/i^f/.+A-A',M' 
+ 6x-f.+f.',. Ux^f^M-f.,,' 

+ 5x+f.+f.>,. (l + 7'^)(l-7^') f/i^f/iw)- (A-26) 
The diagonal parts are 



Qx,x Co7 



5 



ico 
—ico 



A. 5 Meson operators, fermion densities and cor- 
relators 

Operators. As meson operators we introduce fermion bilinears 

or = ^xTT^x , (A.28) 

sandwiching both a gamma matrix F G F and a flavour matrix T E T from 
the sets described above. We show the list of operators in 2 dimensions and the 
corresponding mesonic states in Tab. ( |A.1|) . The 2D peculiarities 
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Table A.l: Meson Operators. 
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Pi 
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r 




TT 



75^" = and i7^7^ = -7" (A.29) 

reduce the possible gamma structures to four. The trivial action of 7°, if used on 
momentum zero states, leads to still further reduction to 2 relevant operators each 
for the flavour singlet and triplet. We remark that these symmetries are not used 
in the program. Correlations from all operators are explicitly calculated and the 
symmetry properties verified. 

Fermion densities. To evaluate densities we need the general expression 

■ ■ ■ r.-A) = • • • (^.so) 

specialised to the 2 fermion isosinglet case 

^(<,r„4°.> = ^i:r„M-.W- (A.31) 

Corresponding to the four possible gamma matrices F G F we obtain 

• for F = 1 the fermion condensate, 

• for F = 7^ the axial condensate or pseudo-scalar density, 

• for F = 7^,7° the 7° and 7^ densities. 

Rewriting this in terms of Q (Eq. |A.23|) and normalising by a constant ^'^ ^^P" 
press trivial volume and dimension behaviour we arrive at the desired expression 

° E lrt^tsQx!s;x,r ■ (A. 32) 
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Noise. We use noise vectors with the properties 

< vlVy >= ^x,y, <r]x>=0 , (A. 33) 

where the components are either continuous, having a Gaussian distribution, or 
are discrete Z2 (±1) or Z4 (±1,±J) Ising variables. The vectors do not carry 
spinor indices. 

Noisy scheme. Using these noise vectors we apply the noisy estimator of the inverse 

Q^r].s = {iv:j:Q^Lv.)) (A.34) 

z 

where the inversion has to be done for each spinor component separately to obtain 
the full information. 

Correlators. For the meson 2-point functions (correlators) we specialise Eq. |A.30 
to the 4 fermion case 

Cir'^' = ii^.TTij^i^yVr^y) (A.35) 
can then be expressed as 

abcd,af3'yS 
abcdja/BjS 

E-p rpab-pl rpicd I ride /^ba /^bc /^da \ ( \ oc\ 

abcdjajSjS 

For operators having an expectation value we have to subtract this value from the 
correlator to get the connected Greens function. This is not explicitly written out 
here, nor is it used in the analysis routine - there the constant part is simply fitted. 

Using the flavour structure of the Greens function, it is possible to write general 
formulae for all flavour matrices 

^xy ~ E ^al3^'y5(^{i^lTyGyS^y^Gxl3,xa — {^^iT''T)Gxl3,y^GyS,xaj , (A. 37) 

where we introduce the simplified notation 

Cl^ = {^PxTTM^yTT^Pyy) = Cl^--'^'^'-\ (A.38) 
Using the connection to the scaled matrix Q 

G = coQ-W; = Q (A.39) 

we get 

= E c^(7'r).;,(7^r),,((trfT)2Q-4^Q;^i^^ - {tT,T^T)Q-ly,Q-lx.) , (A.40) 
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where the matrices 

A,s = Y.^%T,s (A.41) 
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can be calculated and coded in the program. 

We generally use correlators with randomly located point source and summa- 
tion over the target time slice 

C^'^{t) = 5I*^(x+xo,t+to)(xo,io) ' (A.42) 

X 

where the index structure should be clear from the context. 

Noisy scheme. Using Gaussian or Ising noise vectors with the properties Eq. |A.33 



we are able to sum over the time slice S'(xoto) including the initial site xq, to- Using 
Hermiticity of Q we can derive in the triplet case 

^triplet(^) ~ ~6Co ^ (7 T)abi'y ^)cd Q x+xot+tob;xotoc Q xotod;x+xot+toa 
xabcd 

= ^ E {l'^)ai.{l'T),,{Y.Q~xU-,t,b-,.t,^^^^^^ (A.43) 

xabcd 2 a;o 

and, correspondingly, for the singlet 

CL^Ut) = lcl,,iotit) (A.44) 

xotoS,xoto'yQxo+xto+tf3,xo+xto+ta J 1 

xaP^S 

where one would estimate the disconnected part from the full point source. 
Pion. For the pion a special trick is possible. The gamma structure allows to use 
noise vectors including spinor indices with 

< V*xs%t >= S^,ySs,t, < Vxs >= . (A.45) 

Using these only one inversion is necessary, as can be seen from 

i^) ~ E ^a,bSc,d Q x+xot+tob;xotoc Q xotod;x+xot+toa 

xabcd 

+xot+toa;xotocVxoc 

)t) . (A.46) 

ax ze xqc 
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Appendix B 
Analytic results 



B.l Free fermions 

We just give the result for the fermion condensate using free fermions where the 
momenta are restricted using anti-periodic boundary conditions to 

fcM = 7^K + i); n^e{0,...L^-l} . (B.l) 



We obtain the condensate 



' (TRW))-^E ^.....Jt;.^?^-T.*^ ...... . (B.2) 



UfVd V ^ Y.f.i'^f^ sin k^y + (1 - 2k E/, cos 

B.2 Pure gauge topological susceptibility 

In the hmit of independent plaquettes the topological susceptibility can be calcu- 
lated analytically. Introducing the generating functional with periodic BCs for Np 
independent plaquettes we obtain 



z{p,e) 



TT 27r 



_^ieQ+p cost, 



Np 



(B.3) 



where (p denotes the plaquette angle defined in Eq. |A.6| . From this the topological 
susceptibility is given by 

Xtop = ^d's\ogZ{P,e)\e=o (B.4) 
which can be evaluated to 

J.9Q+P COS (, 



Xtop 



rn d4 / 2 13 cos <j, , 

■l-n 2^2^,! ^ _^ /^\2 m p-\ 

J-7T 2n^ 
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Approximations. Two convenient approximations are possible. On the one hand 
the small f3 limit yields 



Xtop 



n27l^27l' 12 



(B.6) 



On the other hand the integral can be approximated by a Gaussian one in the 
large j3 limit using 



P cos (j) = P — 



/3 



+ 



1 



(B.7) 



yielding a topological susceptibility of 



Xtop 



(B.8) 



B.3 Pure gauge plaquette 

For the pure U{1) theory in 2 dimensions it is possible to derive an analytical 
expression for the plaquette even on a finite lattice. We label link angles as (pu^ 
with I and t the site indices from 1 to L and T respectively and express the path 
integral in these variables 



htl d(j)lt2 sr^ T / n\ Jnit{(f>iti+(t>i+it2-<f>lt+ll-'f>lt2) 



[B.9) 



n 



n 27r 27r 



It " " -■• nu 

where we introduced the Fourier-transform of the exponential 



B cos (/) 



incj> 



(B.IO) 



with coefficients 
1 



g/3 cosflig-mi 



-/ e^"°'^cos(n0)c/0 = . (B.ll) 

TT JO 



In signifies the modified Bessel function [ Pre92 ]. As this expression for the path 
integral factorizes we can rewrite Z to 



n-lt It 



Htl 



,«"h9'hi-*"h-i9'hi 



^ 2tx 



27r 



It 



E 



IT \^nit,nit_i^nit,ni_itlnit 



nit L It 

This reduces Z to a simple sum 

oo 



■f3n 



n=l 



(B.12) 



(B.13) 



where we used the symmetry In{x) = I-n{x) vahd for all n and x. To finally get 
the plaquette we use the relation 



<P> = 1 + 



1 dZ 



and 



^^-^^^ - (3) + 1 



to get after some algebra to 



<P> 



MP). 



. W) W) , 



i + 2y 



(B.14) 



(B.15) 



. (B.16) 



This expression is then numerically evaluated using Fortran routines for Bessel 



functions Pre92 



B.4 Hopping parameter expansion 

We start with the distribution of the links in a dynamic fermion simulation 



P,s oc (det[l - 
with H the hopping terms (Eq. [A.22| ) and rewrite using 



det[M^ 



We expand the logarithm 



logM = log(l - kH) = -kH 



^2Trlog Af 



n'H^ K^H^ K^H^ 



+ 



(B.17) 



(B.18) 



(B.19) 



and observe that the first link-dependent term appearing in the trace is the 
term as the odd powers of H vanish in the trace Tri7 = Tri7^ = . . . = and the 
second order contribution 



TiH^ = const. + const. ■ ■ . . . = const. 



is an irrelevant constant. So we end up with 



det[M2 



e 2 



(B.20) 



(B.21) 



to leading order. 
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Writing out the if*^ term, we see that only closed loops of links can contribute 
to the trace, i.e. to leading order only plaquettes are contributing. This results in 

E (tr[(i + + - - r)]u.-,,, ui^_.^^ ui.^. 

x,ii 

+ tr[(i + 7^)(i - 7^)(i - 7'')(i + inp.-^., f/;_-_.,^ U^,, 

+ tr[(l - 7^)(1 + 7^)(1 + 7^)(1 - r)]Ul^ U^+(.-U,f. U^-f.,, 

+ tr[(l - 7'^)(1 - 7^)(1 + 7'^)(1 + 7^)][/t_^ uUi,^-^ [/,+^,, C/.,^) . (B.22) 

To evaluate the gamma matrix expressions we observe that 

trl = 2 , tr7' = Vi , 

tr7V = 25i,j, tr7V7' = Vi,j,A; (B.23) 

and that therefore all gamma terms give the same contribution 

tr [gamma terms] = 2 - 2 - 2 + tr[7''7'^7''7'^] = -4 . (B.24) 

Returning to the full expression we abbreviate the plaquette as above and obtain 

T,H' = -4 5: [Up + U\, + C/; + C/p .) . (B.25) 

As we sum over the lattice sites, we can reorder the contributions and find that 
we have each term twice for each direction. We further combine the complex 
conjugate pairs so that the final expression for the determinant is 

det[M2] = e^^"-' Re c/p , _ ^3 26) 

This first order result can be combined with the standard link action term so that 
the result is an effective pure gauge theory with a shifted beta value of 

(3' ^(3+ IGk^ . (B.27) 
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Appendix C 
Force 



C.l Force on bosonic fields 

Starting with the effective action 

P k,x,y k,x 

we want the part dependent on one specified field spinor 0^. Rewriting yields 

Sl{U,^) = const. + 0^t[^(g2_2g/,,)^,^0j] + [^ 0jt(g2_2g;,,)^j0^ 

y\yj^x y\y+x, 

+ <l^'Jif^l + 1^1 + iQ' - 2Qfi,)^M 
= const. + ^^t^fc + ^fct^^ + ^fct^^fc^^, . (C.2) 



Mapping this to a Gaussian distribution via 

we easily see heatbath updates 

<P': = {K';)-'^R'{K^x)-'Bl (C.4) 
and micro-canonical reflections steps 

R-,-R = -{Kl)m-{K',Y'^Bl 

€ = -0.' - 2(f^,^)-^^?^ (C.5) 

The necessary coefficients B^, {K^)~^, {K^)~^ can be calculated using the formulae 
in App. |A.4| giving explicit expressions for Kj^ 

= clil + AdK^) - 2^fcCo7' + + ^4 = , (C.6) 
which is not a function of the lattice site (though a Dirac matrix) and B^ 

yi^x 

= Y.[iQ' - '^Q^^k)x,y(t^l] - [cl{i + AdK^) - 2^^,colVx ■ (cr) 
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C.2 Force on links 

The force on links is defined by 

- Sl = const. + ReF^^^U^^^ (C.8) 
with the intention to apply this to the heatbath formulae 

P oc e''-'^"!' . (C.9) 
Observing that the links Ux,^ are complex numbers, we define 

cos(f) = cos(0i + 02) with e^"^! = e''^^ ^ U^,,, . (C.IO) 



\F , 



Starting with the action one easily sees 



Sl = const. - Re pU,+ J2€\Qly-^l^kQ.M- (^-H) 

k,x,y 

Expanding and using 2Re z — z + z* and the gamma matrix relations 

(7^(1 + 7"))^ = 7^(1 - 7'^) , ((1 ± 7'')(1 ± 7^^'))^ = (1 ± ± 7") (C.12) 

to combine the complex conjugate pairs and also complex conjugating one of the 
staple expressions (allowed as only the real part is taken) one obtains 



k ti' 

+'/'l+A (l-7'^)(l-7'^') 0.+A'<M' 
+0i+A'+A (l-7^')(l + 7^) 

(l + 7'^')(l+7'^) 0.t^]-A'+A,.')" 
- 4co«:EM'V(l + 7^)0g ■ (C.13) 
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Appendix D 

Root Ordering Fortran codes 



In this appendix we want to give the Fortran codes which are used for the re- 
ordering of the roots Eq. [4.19| . In the following n will always denote the degree 
of the Chebyshev polynomial and m is an integer divisor of n as needed for the 
subpolynomial scheme. 

D.l Simple pairing scheme 

We show the code for the first half of roots. The second half is constructed ana- 
logously. 

if (abs(n/4 - one*n/4) .ge. 0.01) then 

medl = n/4 + 1 ! n/4 odd 

med2 = n/4 + 1 

else 



medl = n/4 
med2 = n/4 + 1 
endif 



! n/4 even 



i=0 



do k=l,n/8 



! take complete 
! groups of 4 



1=1+1; jl(l) = k 



1=1+1 ; jl(l) = n/2-k+l 
1 = 1+1 ; jl(l) = medl+k 
1 = 1+1 ; jl(l) = med2-k 



! -> hashing table 



end do 



r = n/2 - n/2 /4*4 



If (r .ge. 2) then 
k = n/8 + 1 
1=1+1; jl(l) = k 
1=1+1 ; jl(l) = n/2-k+l 
r = r-2 



! last pair 



endlf 

If (r .ge. 1) then 



! last lonely one 



1 = 1+1 ; jl(l) = medl 
endlf 
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D.2 Subpolynomial scheme 



kstep = n / m ! \# of subpolynomials 

i =0 

do k=l,m 

do 1=1, kstep 

i = i+1 ; jl(i) = (1-1) *m + k ! reset index 

end do 
end do 



D.3 Bitreversal scheme 



do b=0,20 

if (2**b .ge. n) then 

bits=b; goto 333 
end if 
end do 
333 continue 



! length of array 



do k=l,2*n 
j(k) =0 
end do 



init 



do k=l,n 
do b=0,bits 

bit(b)=0 
end do 



init 



nn = k-1 

do b=bits-l,0,-l 

if (2**b .le. nn) then 
nn = nn - 2**b 
bit(b) = 1 
end if 
end do 



shift 



! extract bit 



i = 

do b=0,bits-l 

i = i + bit(b) * 2**(bits-l-b) 
end do 
i = i+1 

j(i) = k 
end do 



reverse 
reshift 

-> hashing table 



i = 

do k=l,2*n 

if (j (k) .ne. 0) then 
i = i+1 

jl(j(k)) = i 
end if 
end do 



! no dummy root 
! reset index 
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D.4 Montvay scheme 



read ('approxima.txt') ; 
n := 12 ; 

eps := .1000000044703483 ; 
cO := 52987.39383322105 ; 
Root : = [ 

6.299918614676031E-02 -.146958373796435 *I, 
.2375643902633147 -.2602503551965 *I, 
.483704827825326 -.313922116577824 *I, 
.7450326909011724 -.295678104222787 *I, 
.9616809154023317 -.209697801497745 *I, 
1.08401800398973 -7 . 567825958203078E-02 *I, 
1.08401800398973 + 7 . 567825958203071E-02 *I, 
.9616809154023317 + .2096978014977446 *I, 
.7450326909011726 + .2956781042227866 *I, 
.483704827825326 + .3139221165778238 *I, 
.2375643902633154 + .2602503551965006 *I, 
6.299918614676020E-02 + .146958373796435 *I ]; 

Opt imord(n, Root, cO, 1,1. ,x,eps,l. ,100, 40, yes) ; 
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Appendix E 
Gegenbauer Solver 



E.l Gegenbauer Polynomials 

We define polynomials C^{z) of the complex variable z with degree n and a real 
parameter 7 > via their generating function 

00 

{l + t'-2tz)-^ = J2t''C:{z). (E.l) 



n=0 



Alternatively, an integral representation 

C:{z) = + rd<P {sin <pf^'\z + V^^cos<Pr (E.2) 

n!r(7)^ JO 

is possible. For the many known features of the thus defined Gegenbauer poly- 
nomials like trigonometric representation, coefficients of the highest monomial, 
parity, bounds and large n approximations we defer the reader to the literature 
Bun97| , Pi)rd53|| . For completeness, we mention the first few polynomials 



Q{z) = 1 

Cj{z) = 2jz 

CUz) = 27(7 + 1)^' -7 

CUz) = ^7(7 + 1)(7 + 2)^^ -27(7 + 1)^- (E.3) 
The relative error of the partial sums is given by 

n 

R^iz) = l-(l + f-2tzrJ2t'C2iz) 

k=0 

00 

= {l + t'-2tzy Y: t'C2{z). (E.4) 

k=n+l 

The main item we want to stress is the existence of an recursion relation 

{n + l)C:^,{z) + (n + 27 - l)C:^,{z) = 2{n + 7) zC^iz) (E.5) 
enabling an efficient calculation of these polynomials. 
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E.2 Solver 



The generating function suggests the use of this polynomial expansion to build up a 
solver method. Assume M a Hermitean matrix with spectrum spec(M) C [Ai, A2] 
and < Ai < A2. We want to solve for x in 

M^x = b . (E.6) 



To map to the Gegenbauer polynomials defined in Eq. |E.1| , we transform to a 
normalised matrix A with spec (A) C [—1, 1] 

1 1 + 

M = c(l + - 2tA) ^ A = M+^—, (E.7) 

defining parameters t and c 



We can then write the solution x as 

X = M-'-'b = c-^{l+t'^ -2tA)-^b 

00 00 

= c-^^rc;i(A)6 = ^rs„ (E.9) 

n=0 n=0 

with 

= c~^C:iiA)b, (E.IO) 

i.e. as a sum over shift vectors Sn with exponentially decreasing factors. Moreover, 
the shifts have an easy recursion relation 

(n + + (n + 27 - = 2(n + 7) As^ (E.ll) 

with start shifts s_i and Sq defined by 

s_i = 

So = c"'b. (E.12) 

To obtain a valid solver, we have to find a stopping criterion, i.e we have to 

calculate (at least bounds for) the relative error Eq. |E.4| and the rest vector 

r„ = b-M''xn = RniA)b. (E.13) 
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E.3 Real solver for J = h 



For 7 = 1 one obtains a standard inverter algorithm. Most interesting are non- 
standard cases, like e.g. 7 = |, which is applicable to SUSY models ||Mon97b |. 
The general cases for 7 and z are discussed in more detail in [[Bun97|| . We for 
obvious reasons at this place only consider the real case with 7 = |- 

Legendre Polynomials The Gegenbauer polynomials in the 7 = | case are the 
Legendre polynomials 

C'J\z) ^ P^{z) . (E.14) 
In this case the integral representation is given by 



P„(cos0) = / — {z + Vz^ -1 COS (f))'' (E.15) 



IT 



with a normalisation Pn(l) = 1- The recursion is identical to Eq. |E.5 . 

The important point is that in this case the relative error can be estimated for 
real z G [—1, 1] to 



Rn{t)\<\tr^' . (E.16) 



For the proof we defer to [|Bun97|] . More involved is the estimation for \z\ > 1, 
so we briefly sketch the idea. We regard the Legendre polynomials with complex 
argument z parametrised by 

z = cosh(r + iip) . (E.17) 

Assume that z an t given by Eq. |E.8| are inside the ellipse given by 

Z = cosh{e + icp) (E.18) 

with ^ > 0, G [0, 2tt]. Then we get a uniform bound for the relative error of the 
expansion 

\Rn{z)\ < me'r^' (E.19) 

and equivalently for the error of the solver assuming A is normal with spectrum 
inside the ellipse. 

A priori, 6 is unknown. An estimate can be deducted from the shifts solving 

\\Sn\\ ll^n(^)^ll < p^(cosh0) (E.20) 



llsoll II 
e.g. by Newton-Raphson iterations. 
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Solver implementation For the Hermitcan local bosonic algorithm case M — 
Q^Pn^iQ'^) we are able to give an realistic upper bound for the spectrum 

Xma.{M)^l + S{n,e) , (E.21) 

but we have to use a guess for the unknown Xmin{M) > 

Ai = rGBePn^iroBe) ^ roBin^, + l)Ve , (E.22) 

assuming that the spectrum starts at rGBe,^"GB < 1- 

If Xmin{M) > Ai, the convergence factor is t, but a smaller value would be 
more efficient. In case of Xmin{M) < Ai, cosh^ > 1 holds. The convergence factor 
is te^. If te^ > 1, the series does not converge at all. So there exists an optimal 
choice topt which maps the extreme value A — 1 exactly to Xmin{M) 

where the expansion converges with topt- Using the knowledge of 9, we can deter- 
mine topt from 



'i-toptV _ l + t'^ -2tcoshe 



;e.24) 



\l + toptJ (1 + ^)' 

In reality, this is done by adjusting parameters t and c for the next expansion 

t = { ^ T /, and c ^ c = — (E.25) 

\ topt if ^ > (1 + t'Y ^ ' 

using the information from the last and the fact that we expect similar behaviour 
for configurations close in an updating sequence. 
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Computer realization 

1. start of basic constants 

(a) read t 

2. initialise work scalars and vectors 

(a) So = c^^b 

(b) s_i = 

(c) Xq = So 

(d) iV= llsoll 

(e) n = 

(f) mult = 1 

3. recursion 

(a) Ms ^ Mso 

(b) As ^ '-^so - ^Ms 



(c) W ^ ^As - -^s 

^ ' n+l n+1 



-1 



(d) s_i ^ So 

(e) so^W 

(f ) mult <— mult • t 

(g) Xo ^ a;o + mult • 

(h) n 4— n + 1 

(i) Q from max (%^, l) = P„(cosh^) 
4. test 

(a) if dtle")" < 5rec store t, exit 

(b) else iterate the 
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Appendix F 

4D QCD conventions 



The theory is estabhshed in d = A dimensions on a Euchdean space-time lattice 
with size xT. A gauge field U^{x) G SU (3) is assigned to the link pointing from 
point X to point (a; + /i), where /i = 0, 1, 2, 3 designates the 4 forward directions in 
space-time. The matrix defining the interaction of the fermions is 

QiU)xy = CQ-f%l + Yl'K(^^^'^^IJ.^^IJ.^i^)\)^=o,y 

- kY.{{1 - jnU,{x)6,+,,y + {1 + rWlix - fi)d,-,,y}] , (F.l) 

where k. and Cgw are parameters that have to be chosen according to the physical 
problem under consideration, cq = [cm(1 + 2dK,)]~^ and cm is a constant serving 
to optimise simulation algorithms. Typically k 1/8 and both Cg^ and cm are 
0(1). We refer to the following notation section for the definitions of the matrices 

7^*, 7^, (Tfj^iy and the anti-Hermitian tensor JF^ 



X] 



In order to speed up the Monte Carlo simulation, not the original matrix Q 
but an even-odd preconditioned matrix Q is used. We rewrite the matrix Q as 



^ 5 / 1 + Tee Meo \ /t-, r,\ 



where we introduce the matrix Tee (Too) on the even (odd) sites as 

4 



The off-diagonal parts Meo and connect the even with odd and odd with even 
lattice sites respectively. Preconditioning is now realized by writing the determi- 
nant of Q, apart from an irrelevant constant factor, as 

det[g] oc det[l +Tee]det[Q] 

Q = Co7'(l+T„„-M„e(l+Tee)-lMeo) . (F.4) 

The constant factor cq is given by cq = [cm(1 + 64^;^)]"^, and the constant cm 
is chosen such that the eigenvalues of Q are in the interval [—1, 1]. Since for the 
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simulation algorithms the eigenvalues have to be positive, we finally work with the 
matrix Q^. 

Notations in 4D 

Gamma matrices. The matrices a^^^ fj,,^ — 0, 3, are defined via the commu- 
tator of 7-matrices 

= I [r, 7l ■ (F.5) 

The 4 4 7-matrices are given by 
with the 2 (8) 2 matrices 

Co = -1 ; ej = iaj, j = 1, 2, 3 (F.7) 

and Pauli-matrices aj 

The matrix 7^ = j'^ry'^j^ry^ is thus diagonal 

J^^p{x). This antisymmetric and anti-Hermitian tensor is a function of the gauge 
links and given by 

^A^) = ^[u^,{x)U,{x + ix)Ul{x + v)Ul{x) 

+ Uy{x)Ul{x + u - ii)Ul{x - pi)U^{x - jl) 

+ Ul{x - fi)Ul{x - z> - fi)Uf,(x - TJ - jl)U^{x - u) 

+ Ul{x-u)U^{x-u)U,{x-u + fi)Ul{x) 

- h.c] . (F.IO) 
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